101 questions with a bioinformatician #7: Holly Bik

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.


Holly Bik is currently a Postdoctoral Researcher in Jonathan Eisen’s lab at the UC Davis Genome Center…but not for much longer! Sadly (for us), Holly will soon be leaving Davis to take up a Faculty position in the School of Biosciences at the University of Birmingham, UK.

As a Birmingham Fellow in Bioinformatics, she will no longer be saying things such as "Dude, it's like totally, hella hot" (this is how we all talk in California), and will instead be referring to the weather in the correct British vernacular "I say, one finds this rain jolly bracing". As Curry Capital of Britain she will also be required (under Birmingham law) to dramatically increase her intake of onion bhajjis,  aloo gobi, and peshawari naans (something that sadly seem to have been outlawed in Northern California).

During her time at UC Davis, Holly has been working on PhyloSift, a software pipeline for the phylogenetic analysis of genomes and metagenomes. She has also been working on many other things. You can find out more about Holly by following her on twitter (@hollybik) or by visiting www.hollybik.com. And now, on to the 101 questions...

 

 

001. What's something that you enjoy about current bioinformatics research?

Given my biology background, my favorite aspect of any bioinformatics project is interacting with people from different disciplines (project personnel, and/or talking to people at meetings and on Twitter). I learn something new about computing, software, and/or hardware pretty much during every project.

I’m always astounded by how technology and computers have progressed since I got my first personal computer (way back in 1996), and how we’re now leveraging this computing power in conjunction with deep DNA sequencing technologies to address fundamental scientific questions. The power of bioinformatics is really incredible when you stop and think about it!

 

010. What's something that you *don't* enjoy about current  bioinformatics research?

The lack of documentation for a lot of software packages, and to a lesser extent, encountering uninformative error messages when trying to run command line software that’s supposedly designed for researchers. Both can be a prohibitive barrier to testing out different tools that may actually be extremely useful and informative for your own research. I think there’s a reason that QIIME has become such a powerhouse package for microbiome research — biologists have access to a suite of tutorials, test datasets, and they can boot up the software easily as a Virtual Machine or Amazon Cloud instance.

However, the easiest tools to install and use are not necessarily the best to use for your particular research questions. I read so many software papers describing exciting new software (where the authors usually all come from computer science departments), but when I visit the website I find no useable instructions or run into insurmountable errors when trying to install or execute the code. As a biologist, no one ever sat down and taught me the nitty gritty about makefiles or compiling source code; people that publish software shouldn’t assume their users have a computer science degree. Most computational biologists will make a valiant effort to overcome such problems, but at some point you have to do a cost/benefit analysis of whether persevering is worth your time. I’m only going to spend two days trying to install your software if I think its really really worth it, but in most cases I’ll probably decide that it isn't (and so no citation for you).

 

011. If you could go back in time and visit yourself as an 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

FILE MANAGEMENT. Read up on the best practices for data management, and start forcing yourself to develop good habits NOW. I guarantee that in 5 years' time you will not remember what data you saved in 'analyses.txt'.

 

100. What's your all-time favorite piece of bioinformatics software, and why?

I’m going to be shamelessly biased: my favorite software ever is Phinch, a data visualization framework that I’ve been developing in collaboration with Pitch Interactive — a data visualization studio in Berkeley, CA. We’re using solid software engineering and design principles to build exploratory, interactive visualization tools for scientists. And because the visualizations are built in 3D, the user interface is absolutely gorgeous! Who says you can’t create art when doing bioinformatics research?

 

 

101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality?

I’m a gap (. or -), because I’m mysterious and don’t like to be classified!

Survey results: The extent of gender bias in bioinformatics

I have completed an analysis of my survey that attempted to see whether there is notable gender bias among bioinformaticians. Thank you to the 370 people that completed the survey! A few things to note:

  1. All survey responses are available on Figshare (in tab-separated value format). Anyone else can come along and play with this data, and maybe ask more intelligent questions about it than I did.
  2. My detailed analysis of these responses is also on Figshare as a separate document.
  3. The original Google survey form remains available (also see my blog post about it). If people continue to complete the survey, I will update the main data file on Figshare.

I encourage people to read the full document on Figshare. Because of the high response to this survey, I had enough data to compare gender bias at different career stages, and also between different countries (for a small number of countries).

I'll leave you with just one result from my analysis. I had asked people to identify their current career position, and  I offered 10 possible career stages as answers:

  1. Currently pursuing undergraduate degree (with focus on bioinformatics/genomics
  2. Undergraduate level position in academia or industry  (e.g. Research officer / Junior specialist)
  3. Currently pursuing postgraduate qualification (with focus on bioinformatics/genomics)
  4. Postgraduate level position (e.g. Research assistant). MSc or PhD required for role.
  5. Postdoctoral scholar / Fellow / Research Associate
  6. Lecturer / Instructor/ Senior Fellow / Project Scientist (3+ years post-PhD research experience)
  7. Assistant Professor / Reader / Senior Lecturer (5+ years post-PhD research experience)
  8. Associate or Full Professor / Team Leader (7+ years post-PhD research experience)
  9. Senior Professorial role (e.g. head of a department, 10+ years post-PhD research experience)
  10. Super Senior role (e.g. Dean of a school or CEO, 15+ years post-PhD research experience)

Because these categories are a little bit subjective, and because some of the categories (levels 1, 9, and 10) had the least number of responses, I decided to smooth the data by combining adjacent categories. I.e. 1&2, 2&3, etc.

So this is what the percentage of male and female bioinformaticians looks like with respect to progress through their scientific career:

Things start off looking quite equitable but proceed to diverge around the time that people are becoming Associate Professors. However, the situation is more complex than this (see Figure 3 in my full analysis).

Can Twitter help us find out the gender ratio of bioinformaticians?

I'm still collecting survey results to try to understand the extent of gender bias in bioinformatics. I plan to publish an analysis of these results next week and I'll also share all of the the raw survey results via Figshare (in case anyone else wants to dive deeper).

One thing that is hard to accurately know is just what the gender ratio is across everyone who identifies themselves as a bioinformatician. A survey that is trying to ask something about gender bias no doubt introduces its own bias in the types of people who would be interested in completing such a survey.

But maybe Twitter can be of use in trying to determine a 'background' gender ratio among bioinformaticians. The evidence is hardly conclusive, but there are some data that suggests that more women use twitter than men. There's also data that there are comparable numbers of male/female users. In any case, numbers of users doesn't tell the whole story. Other research shows that, on average, men have  15% more followers than women, and a tool called Twee-Q that tries to identify the likely gender of twitter users, finds that men tend to be retweeted almost twice as often as women.

Despite gender biases in how people use twitter, it might still be useful to see what the gender ratio is of people who follow bioinformatics-type accounts. This is something that twitter can show you at analytics.twitter.com. However, this only seems to be enabled on accounts that have a certain number of followers. Here is what the results looks like for the @assemblathon twitter account (click to enlarge):

So twitter identifies — presumably using some sort of gender-guessing-algorithm — that 82% of  the followers are male. I'd love to see what other results look like for other bioinformatics twitter accounts. However, I think it is a better test if the accounts in question are themselves gender-neutral. I.e. affiliated to a resource or institution. If you run a bioinformatics-related twitter account that is gender-neutral, and if you can access analytics.twitter.com, I'd love it you could share your results with me (via comments below or on twitter @kbradnam).

101 questions with a bioinformatician #6: Mario Caccamo

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.


Mario Caccamo is the director of The Genome Analysis Centre, a BBSRC-sponsored research institute focused on genomics and computational biology. You may know of this institute by its shortened name 'TGAC' (of course, this is not the only place where you will see this initialism). As Director, Mario's role is to "ensure TGAC is equipped with the resources and people to deliver good science".

Mario's skills as a bioinformatician are matched only by his prowess on the volleyball court. When he used to play for the Informatics volleyball team at the Wellcome Trust Sanger Institute, he deservedly earned the nickname Super Mario. You can find out more about Mario by following him on twitter (@mcaccamo).

 

001. What's something that you enjoy about current bioinformatics research?

I see bioinformatics as a branch of molecular biology. I love the elegance of molecular biology. Research in bioinformatics is about capturing the beauty of biology in abstractions that can help us to discover new knowledge. Beauty here means complexity, optimisation, economy (in terms of information content) and functionality (among other things). One of the most exciting things about molecular biology is how young it is as a science. Our colleagues working in bioinformatics are the Newtons and the Keplers of molecular biology — so much to be done and discovered. 

 

010. What's something that you *don't* enjoy about current  bioinformatics research?

The flip side is that we still don’t understand enough about the basic building blocks in molecular biology. The language we use to describe biological systems and processes is incomplete. We struggle with issues that sometimes look simple. What did we know about epigenetic modifications 10 years ago for instance? Little compared to what we know today. Bioinformaticians struggle with the incompleteness of the underlying basic knowledge and keep re-inventing the wheel leading to frustration. Perhaps these are growing pains — but pains nevertheless.

 

011. If you could go back in time and visit yourself as an 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

My advice would be: “This is good enough. Let it go.” Recognising when you are in the land of diminishing returns is a skill that should be taught at school. This is particularly relevant for bioinformatics. You can always close more gaps, find the missing gene or remove another false positive — you can do this for the next 10 years. My recommendation is...don't. Another perhaps more mundane recommendation is to learn either gawk, Perl one-liners, or some of the basic Unix command line tools to manipulate strings and text; they will give you the data you need for your best presentation the night before your talk. 

 

100. What's your all-time favorite piece of bioinformatics software, and why?

It has to be HMMER — a beautiful super efficient piece of software. I know that we shouldn’t use a hammer for all kind of different nails but somehow HMMER manages to prove that advice wrong. You can HMMER so many nails with this hammer.

 

101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality?

I think I would take W. I like W as a strange letter (no explanation for that) — but it is A or T, alpha or omega in the nucleotide alphabet.