The Tesla index: a measure of social isolation for scientists

July 31, 2014 by Keith Bradnam

Abstract

In the era of social media there are now many different ways that a scientist can build their public profile; the publication of high-quality scientific papers being just one. While publishing journal and book articles is a valuable tool for the dissemination of knowledge, there is a danger that scientists become isolated, and remain disconnected from reality, sitting alone in their ivory towers. Such reclusiveness has been long been all too common among academic scientists and we are losing sight of other key outreach efforts such as the use of social media as a tool for communicating science. To help quantify this problem of social isolation, I propose the ‘Tesla Index’, a measure of the discrepancy between the somewhat stuffy, outdated practice of generating peer-reviewed publications and the growing trend of vibrant, dynamic engagement with other scientists and the general public through use of social media.

Introduction

There are many scientists who actively take the time to pursue their science in as much of a public manner as possible. They work hard to ensure that their peers, and the public at large, are kept informed of their latest research. Consider Titus Brown, a genomics and evolution professor at Michigan State University[1]. Although he has contributed to a meagre number of — largely uninteresting — publications[2], he has instead embraced social media[3] to excite and stimulate others with news of his past, current, and future work.

Now consider Nikola Tesla[4]; although he may have forever changed the world through his many scientific inventions[5], he was a famous recluse[6] and surprisingly did not contribute to any blog, nor did he even bother to set up an account on twitter. I am concerned that the anti-social and secretive behavior of Nikola Tesla is something that is all too common in many other scientists, particularly in those who continue their obsession with publishing work that will forever live behind pay-walls, invisible to all but the priviledged few.

I therefore think it’s time that we develop a metric that will clearly indicate if a scientist is a reclusive introvert with no interest in sharing their work with others or engaging with the wider community. This will allow others to adjust our expectations of them accordingly. In order to quantify the problem and to devise a solution, I have compared the numbers of followers that research scientists have on twitter with the number of citations they have for their peer-reviewed work. This analysis has identified clear outliers, or ‘Teslas’, within the scientific community. I propose a new metric, which I call the ‘Tesla Index’, which allows a simple quantification as to the degree of social isolation of any particular scientist.

Results and Discussion

I took the number of Twitter followers as a measure of ‘social outreach and engagement’ while the number of citations was taken as a measure of ‘boring scientific output’. The data gathered are shown in Figure 1.

Figure 1: Twitter followers versus number of scientific citations for a sort-of-random sample of researcher tweeters — Figure 1: Twitter followers versus number of scientific citations for a sort-of-random sample of researcher tweeters

I propose that the Tesla Index (T-index) can be calculated as simply the number of Twitter followers a user has, divided by their total number of citations. A low T-index is a warning to the community that researcher 'X' may be forsaking all methods of publicly sharing their work at the expense of soley publishing manuscripts. In contrast, a very high T-index suggests that a scientist is being active in the community, informing and educating their peers, colleagues, and the wider public. They are thus playing a positive role in society. Here, I propose that those people whose T-index is lower than 0.5 can be considered ‘Science Teslas’; these individuals are highlighted in Figure 1.

References

http://ged.msu.edu ↩
http://scholar.google.com/citations?user=O4rYanMAAAAJ&hl=en ↩
https://twitter.com/ctitusbrown ↩
http://en.wikipedia.org/wiki/Nikola_Tesla#Literary_works ↩
http://theoatmeal.com/comics/tesla ↩
http://www.viewzone.com/tesla.html ↩

Acknowledgments

This research was inspired by a piece of completely unrelated work by Neil Hall.

A CEGMA Virtual Machine (VM) is now available!

July 30, 2014 by Keith Bradnam

Last week I blogged about the ever growing popularity of CEGMA and also the problems of maintaining this difficult-to-install piece of software. In response to that post, people helpfully pointed out that you can more easily install/run CEGMA by using package managers such as Homebrew and/or even run CEGMA on a dedicated Amazon Machine Instance.

These responses led me to update the CEGMA FAQ to list all of the alternative methods of getting CEGMA to run (including running it as an iPlant application). I’m happy that I can today announce a new addition to this list: CEGMA is now available through virtualization:

Korflab CEGMA VM Information

Our CEGMA VM runs the Ubuntu operating system and is pre-configured to have everything installed that CEGMA needs. I’ve tested the VM using the free VirtualBox software and it seems to work just fine [1].

This also means that I will no longer be offering a service to run CEGMA on behalf of others. I had previously offered to run CEGMA for people who had trouble installing the software (or more commonly, the pieces of software that CEGMA requires). I’ve run CEGMA over 100 times for others and this has been a bit of a drain on my time to say the least. Hopefully, our CEGMA VM is a viable alternative. Many thanks are due to Richard Feltstykket at the UC Davis Genome Center’s Bioinformatics Core for setting this up.

Words that will come back to haunt me I expect! ↩

101 questions with a bioinformatician #12: Karen Eilbeck

July 23, 2014 by Keith Bradnam

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting theirbioinformatics careers.

Karen Eilbeck is an Associate Professor of Biomedical Informatics at the University of Utah. Karen comes from a long line of distinguished bioinformaticians who learned their skills at the highly regarded Bioinformatics M.Sc. program at the UK's University of Manchester (although they do let some riff-raff in).

If you read Karen's research statement, you will see that there is a clear focus to her work:

Quality control of genomic annotations; Management and analysis of personal genomics data; Ontology development to structure biological, genomic and phenotypic data

In helping build both the Gene Ontology and Sequence Ontology resources, Karen's work has led to the development of powerful structured vocabularies that help ensure that all biologists can speak the same language. Developing ontologies is harder than you might imagine, especially when you are trying to generate precise definitions for very nebulous concepts such as what is a gene?

You can find out more about Karen from the Eilbeck Lab website. And now, on to the 101 questions...

001. What's something that you enjoy about current bioinformatics research?

I think genomic analysis is fascinating. The human genetics stories suck me in, where bioinformatics is used to find the variant causing the phenotype. The story does not end there, tests are developed, or therapies targeted.

010. What's something that you *don't* enjoy about current bioinformatics research?

This is a positive and a negative. I like being part of collaborative projects. It is exciting and things get done. The downside is the amount of time on the phone. It is not something I would ever have anticipated. Conference calls either go OK, or someone is heavy breathing in a train station and hasn’t put their phone on mute. The video conference is either delayed or the resolution is not great. One of my colleagues shared this video with me, which has a lot of truth to it.

011. If you could go back in time and visit yourself as an 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

Only a single piece? OK, take your math classes more seriously. I wish I had known how to program when I was doing statistics classes. Instead of using packages like SPSS it may have been more educational to implement tests myself.

100. What's your all-time favorite piece of bioinformatics software, and why?

I am totally in love with a piece of software right now called Phevor, which re-ranks variant prioritization based on phenotype descriptions and uses a variety of ontologies to do its magic. Which brings me to my all time fave tool: OBO-Edit. I think that OBO-edit was underrated. This tool was developed by the Gene Ontology consortium to build their ontology, and it rapidly became adopted by the biological community. It is easy to use and underpinned many of the ontologies in the bioinformatics domain today. The lead developer for a long time was John Richter who is also a stand-up comedian that went on to work for Google. OBO-edit will always have a place in my heart

101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality?

W (A or T). On the one hand it's reserved and to the point (A), on the other hand it's full of energy and works well with others (T). Also, much like my name, there is confusion when it comes to pronunciation (Eel-beck or I’ll-Beck).

W and its skinny friend V, seem interchangeable regarding pronunciation. A friend of mine calls a character from Star Wars Darth Wader, which make me smile.

Good news: CEGMA is more popular than ever — Bad news: CEGMA is more popular than ever

July 21, 2014 by Keith Bradnam

I noticed from my Google Scholar page today that our 2007 CEGMA paper continues to gain more and more citations. It turns out that there have now been more citations to this paper in 2014 than in any previous year (69 so far and we still have almost half a year to go):

Growth of citations to CEGMA paper, as reported by Google Scholar

I've previously written about the problems of supporting software that a) was written by someone else and b) is based on an underlying dataset that is now over a decade old. These problems are not getting any easier to deal with.

In a typical week I receive 3–5 emails relating to CEGMA; these are mostly requests for help with installing and/or running CEGMA, but we also receive bug reports and feature requests. We hope to shortly announce something that will help with the most common problem, that of getting CEGMA to work. We are putting together a virtual machine that will come pre-installed and configured to run CEGMA. So you'll just need to install something like VirtualBox, and then download the CEGMA VM. Hopefully we can make this available in the coming week or two.

Unfortunately, we have almost zero resources to devote to the continuing development of this old version of CEGMA; any development that does happen is therefore extremely limited (and slow). A forthcoming grant submission will request resources to completely redevelop CEGMA and add many new capabilities. If this grant is not successful then we may need to consider holding some sort of memorial service for CEGMA as it becoming untenable to support the old code base. Seven years of usage in bioinformatics is a pretty good run and the website link in the original paper still works (how many other bioinformatics papers can claim this I wonder?).

Update: 2014-07-21 14.44

Shaun Jackman (@sjackman on twitter) helpfully reminded me that CEGMA is available as a homebrew package. There is also an iPlant application for CEGMA. I've added details of both of these to a new item in the CEGMA FAQ:

Are there other - less painful - ways that I can install CEGMA?

Update: 2014-07-22 07.36

Since publishing this post, I've been contacted by three different people who have pointed out different ways to get CEGMA running. I'm really glad that I blogged about this else I may not have found about these other methods.

In addition to Shaun's suggestion (above), it seems that you can also install CEGMA on Linux using the Local Package Manager software. Thanks to Masahiro Kasahara for bringing this to my attention. Finally, Matt MacManes alerted me to the fact that their is a public Amazon Machine Instance called CEGMA on the Cloud. More details here.

Update: 2014-07-30 19.31

Thanks to Rob Syme, there is now a Docker container for CEGMA. And finally, we have now made a Ubuntu VM that is pre-installed with CEGMA (thanks to Richard Feltstykket at the UC Davis Genome Center's Bioinformatics Core).