Unpronounceable — why can't people give bioinformatics tools sensible names?

Okay, so many of you know that I have a bit of an issue with bioinformatics tools with names that are formed from very tenuous acronyms or initialisms. I've handed out many JABBA awards for cases of 'Just Another Bogus Bioinformatics Acronym'. But now there is another blight on the landscape of bioinformatics nomenclature…that of unpronounceable names.

If you develop bioinformatics tools, you would hopefully want to promote those tools to others. This could be in a formal publication, or at a conference presentation, or even over a cup of coffee with a colleague. In all of these situations, you would hope that the name of your bioinformatics tool should be memorable. One way of making it memorable is to make it pronounceable. Surely, that's not asking that much? And yet…

There is a lot of bioinformatics software in this world. If you choose to add to this ever growing software catalog, then it will be in your interest to make your software easy to discover and easy to promote. For your own sake, and for the sake of any potential users of your software, I strongly urge you to ask yourself the following five questions:

  1. Is the name memorable?
  2. Does the name have one obvious pronunciation?
  3. Could I easily spell the name out to a journalist over the phone?
  4. Is the name of my database tool free from any needless mixed capitalization?
  5. Have I considered whether my software name is based on such a tenuous acronym or intialism that it will probably end up receiving a JABBA award?

101 questions with a bioinformatician #10: Lex Nederbragt

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.

This is the third 'binary' post in this series — where the interviewee number consists of just ones and/or zeros. If this fact makes you excited, then you probably need to get out more.


Lex Nederbragt works as a Bioinformatician at the Norwegian Sequencing Centre (where they probably do more than just sequence Norwegians). He is also an Associate Professor at the Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo.

As a Dutchman living in the least populous of the three Scandinavian Kingdoms, Lex can take comfort in knowing that the Netherlands retain the upper hand in their battles with Norway on the football field.

Away from football  — and this is the last chance you'll have to get away from football for the next few weeks — Lex is someone who posts fantastic amounts of useful information on his blog. If you have any interest in high-throughput sequencing and assembly, then you owe it to yourself to follow his blog updates. 

You can find out more about Lex by following him on twitter (@lexnederbragt), or reading his aforementioned blog (In between lines of code) or his other blog…presumably the world's only blog devoted to the Newbler assembler.

And so on to the 101 questions...

 

 

001. What's something that you enjoy about current bioinformatics research?

The increasing focus on reproducibility and reusability. Making sure others can reproduce your work is such a fundamental aspect of science, and computational work should be easy to reproduce in principle. It is fascinating to see how difficult this turns out to be in practice — even in cases where the description of the work is very complete.

 

010. What's something that you *don't* enjoy about current  bioinformatics research?

I'm not the first one to complain about the seemingly unlimited growth in tools meant for the same job, e.g., short read mappers. My field of interest is de novo genome assembly, and there too new tools appear regularly. I think it is about time we settle on a set of tools that appear to be best suited for the job, and move on to finding ways to determine which tools works best for each individual dataset and research question. In the case of assembly, we basically already know the set of programs that generally perform well. Now we need to develop and implement evaluation tools that tell a researcher which assembly of the data is the best one for their purposes.

 

011. If you could go back in time and visit yourself as an 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

I am a bit ambivalent here. It took me a long time to realize that I wanted to become a bioinformatician, I missed a lot of signals how much I enjoyed programming, for example. So, I would like to tell myself to explore computational science much more than I did. On the other hand, waiting this long to make the switch to bioinformatics meant I have acquired a very firm background in biology. I find this essential for my work, as it allows me to make connections between the technological aspects of high-throughput sequencing experiments and data analysis, and the biological questions that inspired the experiments in the first place. So, I would also like to tell myself to keep on studying biology.

 

100. What's your all-time favorite piece of bioinformatics software, and why?

The Newbler assembly and mapping program from Roche/454 Life Sciences. It is not the program per se (it's good, but not necessarily the best; nor is it open source, for that matter). However, it is through the use of this program I was propelled into bioinformatics. I became very familiar with it and started scripting to massage its output. I even wrote a user-oriented manual for Newbler. These days, I use many more assembly programs besides Newbler, but my bioinformatics 'roots' will always be Newbler.

 

101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality?

B, as it stands for 'C or G or T', so it is flexible, allowing several alternatives and keeping options open. But it also means knowing your limits, not everything goes. I also like to have a 'plan B' in the back of my head.

How to make your genomics website more suitable for an English-speaking audience

Today I visited the website of the Beijing Institute of Genomics (BIG) for the first time. BIG is not to be confused with BGI (which was formerly known as the Beijing Genomics Institute). If you look at just about any web page on this site other than the home page (which contains an unusual visual element), you'll see the following image:

My sharp, British-born, eyes quickly recognized this as the UK's Houses of Parliament in London (well technically it's the Palace of Westminster). See this image for a comparison. I then noticed that this image doesn't feature on the Chinese language version of the website (which has a completely different design).

I can only assume that some web designer thought that an image like this would be fitting because it is the English-language version of the website, and that they therefore chose an image of something (incorrectly) deemed to be English. At this point, I feel obliged to share the following video which offers a definitive explanation as to the differences between England, Great Britain, and the United Kingdom:

Reflections on my '101 questions with a bioinformatician' series

This is in lieu of a regular '101 questions with a bioinformatician' post which has been delayed (hopefully by only a day). This series of interviews has now been running for over 2 months and — judging by my web stats — it seems to be popular. In fact, these posts now account for the majority of traffic to this site.

Thanks to everyone who has contributed so far, and for everyone who has been reading these interviews. It's been fun doing this and I've enjoyed seeing the variety of answers that people have provided.

I should confess that I'm solely responsible for adding hyperlinks to the answers that people provide, and in addition to adding links for obvious items like pieces of bioinformatics software, I sometimes like to have a bit of fun with what I choose to link to. E.g. see the links I added to question 101 in my interview with Holly Bik.

To finish off, here are some relevant numbers about this series:

  • 10 — number of interviews posted
  • 2 — number of interviews finished and (almost) ready to be posted
  • 6 — number of people who have agreed to be interviewed but haven't yet sent me their answers (cough, cough).
  • 81 — my current list of 'potential interviewees'

The last point means that hopefully I can keep this series going for a while longer. I guess that I now have to aim for an interviewee #101, (which would be the 102nd interview…obviously).