The names of bioinformatics tools that help study evolution shouldn't feel that they also have to evolve

Thanks to Torsten Seemann for bringing this to my attention…


In 2003 a bioinformatics tool was published. A tool with a thoroughly sensible name and acronym:

A simple name with a simple, and not-too-bogus. initialism. Bravo. However, a subsequent update to BIBI brought about a change to the name:

  • le BIBI:

Where the 'le' refers to 'light edition'. It should be said that most references to this tool drop the superscript notation for 'le'. Let's move forward to the present day and the publication of another version of this tool:

The full expansion of this new name is as follows:

  • Light Edition Bioinformatics Bacterial Identification Tool Quick Bioinformatic Phylogeny of Prokaryotes

Quite a mouthful! Bonus points for including 'Bioinformatics' and 'Bioinformatic' as part of the same name, as well as the largely redundant inclusion of 'Bacterial' as well as 'Prokaryotes'.

Generally I find use of superscript in software names to be largely unnecessary. It can make the tool name harder to read and it is unlikely to reproduced verbatim by others who mention your software. Starting your software with a lowercase letter also means that this might appear in uppercase if used to start a sentence (as happens several times in the above paper). Not a terrible problem but it reduces the strength and consistency of your 'bioinformatics brand'.

And the award for needless use of subscript in the name of a bioinformatics tool goes to…

The following paper can be found in the latest issue of Bioinformatics:

MoRFs are molecular recognition features, and the tool that the authors developed to identify them is called:

MoRFCHiBi

So the tool's name includes a subscripted version of 'CHiBi', a name which is taken from the shorthand name for the Center for High-Throughput Biology at the University of British Columbia (this is where the software was presumably developed). The website for MoRFCHiBi goes one step further by describing something called the MoRFChiBi,mc predictor. I'm glad that they felt that some italicized text was just the thing to complement the subscripted, mixed case name.

The subscript seems to serve no useful purpose and just makes the software name harder to read, particularly because it combines a lot of mixed capitalization. It also doesn't help that 'ChiBi' can be read as 'kai-bye' or 'chee-bee'. I'm curious whether the CHiBi be adding their name as a subscripted suffix to all of their software, or just this one?

More mixed-case madness in the name of a bioinformatics tool

From the latest issue of Bioinformatics we have:

According to the abstract, the 'SUB' comes from subcellular, the 'A' comes from Arabidopsis, and the 'con' comes from 'consensus'. So why isn't it SUBACON? Maybe because people might then read it as 'sue bacon'?

It's not clear to me if this is meant to be pronounced 'soo-ba-con' or 'sub-ay-con'. The abstract then goes on to mention something called the ASURE portal (pronounced 'azure' or 'ay-sure'???), where ASURE = Arabidopsis SUbproteome REference.. If this was following the same rules as SUBAcon, shouldn't this be called ASUre (or even ASUBre)?

Random capitalization strikes again, or am I only dreaming?

A paper in BMC Bioinformatics describes a new piece of software:

morFeus: a web-based program to detect remotely conserved orthologs using symmetrical best hits and orthology network scoring

Naturally, my first instincts were to check whether this was a name worthy of a JABBA award, but morFeus does not appear to be an acronym or initialism. I say that because although the name morFeus appears 116 times in the manuscript, no explanation is ever given as to why the software has that name.

My first thought was that maybe it is a reference to Morpheus, the Greek god of dreams, or maybe to the character of Morpheus from The Matrix. I don't really care about why it is called morFeus — a name that my spell checker keeps correcting to morgues — but it is another example of the, seemingly random, capitalization of bioinformatics tools.

When I visited the web server for the morFeus tool, I did notice something in small print at the bottom of the page:

  • morFeus stands for meta-analysis based orthology finder using symmetrical best hits

This is something that also appears as a keyword in the manuscript, but it is not entirely obvious as to whether this really is meant to be an initialism, or why the F is capitalized. I'm completely stuMped.

What's in a name? Some thoughts on the 'exSPAnder' assembly tool

This week a new tool was published in the Bioinformatics journal:

ExSPAnder: a universal repeat resolver for DNA fragment assembly

The tool's name really refers to the name of an algorithm that is implemented as part of the SPAdes genome assembler. I don't think that this is particularly obvious from the title of the paper. The results section in the paper further complicates this somewhat. E.g. this is how the comparative assembler results are reported in Table 2 of the paper:

The entry called 'SPAdes 2.4' refers to a version of the SPAdes assembler that doesn't use the exSPAnder algorithm, whereas the entry marked 'EXPANDER" refers to a newer version of the SPAdes assembler that does include the algorithm. I find this confusing and it is one of three issues that I have concerning the use of the exSPAnder name:

1. Do we really need to start giving names to algorithms that are part of another tool? This has the potential to create a lot more confusion for people. Particularly when there is no tool called 'exSPAnder' that you can download from anywhere. If somebody implemented the algorithm as part of another piece of software would they be expected to retain the exSPAnder name somewhere (MegaAssembler featuring exSPAnder)?

2. You would hope that the website that the paper links to gives you more information about exSPAnder. But that's not the case:

  • Number of mentions of exSPANder in the publication: 35
  • Number of mentions of exSPAnder in the linked software web page: 0
  • Number of mentions of exSPAnder in the latest SPAdes v3.1.0 manual: 0

Again, I think this can only lead to confusion. The mention of exSPAnder as if it was its own separate tool suggests that this is software that you can download. E.g. this is from the Conclusion section of the paper:

Benchmarks across eight popular assemblers demonstrate that exSPAnder produces high-quality assemblies for datasets of different types.

But exSPAnder is not an assembler that anyone can download and use at the moment. Rather you can download the SPAdes assembler which may or may not feature the exSPAnder algorithm (I don't know because the website and the manual doesn't say).

3. My final issue is perhaps the most minor one and it relates to this horrible trend of using mixed capitalization for bioinformatics tool names. If you are going to do this, please be consistent and please realize that journal formatting conventions may mess up your planned use of capitalization. Here are the different ways you can see 'exSPAnder' referred to in this paper:

  • ExSPAnder: 1
  • exSPAnder: 1
  • EXPANDER: 1
  • EXSPANDER: 28

So I'm assuming that the latter format is the one that the authors are really using and the other variations are due to problems of the journal formatting the article. Using small caps like this is a great way to guarantee that no-one else will bother to format the name like this. Okay, time to finish this post as I need to go and work on my new assembly tool:

MaSSEMbLerXL— an assembler that assigns different font sizes to each DNA base

 

Is the naming of bioinformatics software getting out of CoNtRol?

There is a new paper in the journal Bioinformatics. This is the title of that paper:

CoNtRol: an open source framework for the analysis of chemical reaction networks

Now people will know that I have no stomach for bogus bioinformatics acronyms and initialisms, so is CoNtRol worthy of a JABBA award? Well I can't give it such an award because CoNtRol is not an acronym or an initialism. At least I don't think it is. 

The abstract describes CoNtRol as a web-based framework for analysis of chemical reaction networks (CRNs). So even though the capitalized letters in CoNtRol give you CNR, maybe it's really all about CRNs???

The CoNtRol website makes things a little more confusing by starting their introduction with the text: CoNtRol (CRN tool) is a web application. Are you now thinking what I'm thinking? Is CoNtRol the world's first bioinformatics software based on an anagram (CoNtRol = CRN tool)? If this isn't the reason, then I can only assume that someone decided to just randomly capitalize various letters in the name.

Whatever the reason for the name, the more practical issue is that these tools can often be hard to find with web search engines. It doesn't show up on the first page of Google results if you search for control bioinformatics web app. Nor does it show up if you search for control chemical network app. There is something to be said for giving software novel names.