Time for a classic example of a JABBA-award winning piece of bioinformatics software

jabba logo.png

Normally, I introduce the name of the JABBA-award-worthy acronym before I show you the full name of the offending piece of software. But this time, let's play a little game. Here is the title of a recent article from the journal Bioinformatics, only I have removed the software acromym and the tell-tale capitalization from the name:

small molecule activity scanner web service based

So now you know the name, have a guess at what the acronym/initalism is. I feel confident that no-one will guess the answer. You'll have to scroll down for the reveal…

 

Okay, here it is:

SEABED: Small molEcule activity scanner weB servicE baseD

Note that:

  1. Only the 'S' is clearly derived from the initial letter of a word
  2. The 'A' is left ambiguously unexplained in the capitalization (as presented in the journal title). One might presume that it comes from 'Activity' but I wouldn't rule out 'scAnner'.
  3. However you derive the letters in SEABED, one (or more) words don't contribute to the acronym at all.

All of which makes SEABED a worthy recipient of a JABBA award. The only saving grace is that a Google search for seabed bioinformatics finds the paper as the top hit.

One downside to this tool is that the SEABED webserver (http://www.bsc.es/SEABED) doesn't seem to working at all at the moment.

Tales of drafty genomes: part 2 — when draft genomes took over the world

This is the second post in an infrequent series that looks at draft genomes.

At the time of writing, Google has indexed almost 400,000 pages that include a mention of the phrase draft genome. Prior to the year 2000, there are zero mentions of this phrase in the tech giant’s search index.

The phrase ‘draft genome’ came to prominence with the publication of the ‘working draft’ version of the human genome[1]. But referring to published genomes as anything other than ‘complete’ was still atypical at this time. This can be seen if you search Google Scholar for papers that include in their titles either the phrase draft genome sequence or complete genome sequence. When you look at how these results change over time, an interesting pattern emerges:

Number of papers indexed by Google Scholar that include the phrases 'Complete genome sequence' or 'Draft genome sequence' in their titles.

Around 2000–2003, there were a small number of papers mentioning draft genome sequences. These are nearly all related to the draft sequences of the human or rice genomes. Usage of the phrase (in journal titles) didn’t break double digits until 2011. Draft genomes then became a much more widely used phrase in 2012 and by 2013 they overtook usage of ‘complete genome sequence’

I find this reveals something about the nature of sequencing and genome assembly. It almost feels like we are giving up our ambition to finish genomes (whatever ‘finished’ actually means) and are more willing to settle for something that is clearly incomplete.

A definition of ‘draft’ provided by Merriam-Webster is as follows:

A version of something (such as a document) that you make before you make the final version

In an ideal world, I would hope that all of these draft genomes would also end up being replaced by ‘final versions’. But I’m doubtful that many of these published sequences will be completed any time soon.


  1. See part 1 in this series for more details about the drafty nature of the human genome.  ↩

Sowing the seeds of bad bioinformatics names

Here are two simple pieces of advice for people who are looking for a name for their latest bioinformatics tool/database/resource:

  1. Avoid common words which might cause people searching for your tool to find something else instead.
  2. Choose a name that hasn't been used before by the bioinformatics community.

Having said that, let's look at a new paper in the journal Bioinformatics:

Seed: a user-friendly tool for exploring and visualizing microbial community data

This name 'Seed', is a not-too-offensive acronym for Simple Exploration of Ecological Data. So what's my beef with it?

The problem is that words like seed are going to appear all over the Internet. My standard test for the 'searchability' of a bioinformatics tool is to search for the tool name followed by the word 'bioinformatics'. Your resource's website or publication should hopefully be the number one result (or somewhere on the first page). However, that is not what happens here.

And searching for 'seed bioinformatics' raises more problems by clashing with my first piece of advice. E.g. here are a couple of papers that were in my first page of Google results:

2010: Accessing the SEED Genome Databases via Web Services API: Tools for Programmers

2011: SEED: efficient clustering of next-generation sequences

So what happens if you include 'microbial' into your search terms? Won't that help?

Nope. Turns out that the SEED — not an ancronym as far as I can tell — is an annotation environment for microbial genomes that has been around for a decade, and which has spawned many papers, e.g.:

2014: The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST)

All of which means that people looking to find the newly published Seed tool, are not going to have much luck when using search engines.

Is it a 'bad idea' to include gratuitous pictures of cleavage on an Oxford Journals website?

In a word, 'yes'.

2015-02-16: Note that this story has been updated after Oxford Journals contacted me about this (see end of post).

I know that journals need to make money, but it seems a bit shoddy when they allow any form of advertising to appear on their websites. Came across an article at Nucleic Acids Research today which featured the following advert:

Given that I have published in this journal before, I suppose that people reading our articles will also have a chance of seeing ads like this. I would ask Oxford Journals to think carefully about whether they really want adverts like this appearing on their site. This doesn't seem a particularly good fit for them as a scientific publisher — for that matter, it doesn't seem a great fit for for the advertiser either.

Update: Oxford Journals reached out to me on twitter with some good news: