Another hard-to-pronounce bioinformatics software name

October 21, 2015 by Keith Bradnam

This was from a few months ago, published in the journal Nucleic Acids Research:

CATH FunFHMMer web server: protein functional annotations using functional family assignments

So how do you pronounce 'FunFHMMer'? I can imagine several possibilities:

Fun-eff-aitch-em-em-er
Fun-eff-aitch-em-mer
Fun-eff-hammer
Fünf-hammer

Reading the manuscript suggests that 'FunF' stems from 'FunFam(s)' which in turn is derived from 'functional families'. This would suggest that options 1 or 3 above might be the correct way to pronounce this software's name.

The fully expanded description of this web server's name becomes a bit of a mouthful:

Class Architecture Topology Homologous Superfamily Functional Families Hidden Markov Model (maker?)

If you want your bioinformatics software to have a memorable name, it helps if the name is pronounceable

August 12, 2015 by Keith Bradnam

There is a new paper in the journal Bioinformatics:

Applying stability selection to consistently estimate sparse principal components in high-dimensional molecular data

The paper describes a new method for implementing a Principle Components Analysis (PCA) of data. That new method has a name. That name has just seven characters. How hard can it be to pronounce?

S4VDPCA: ess-four-vee-dee-pee-cee-ay

It doesn't exactly trip off the tongue and having four 'ee-sounding' letters together (VDPC) doesn't make it easy to remember. When I first came across this paper, I skimmed the article, waited an hour, and then tried to remember the name. I could remember that it included '4', 'V', and 'D', but couldn't remember the order (or that it also included an 'S')

It is by no means essential that bioinformatics tools have easily pronounceable names, but this will help people remember the name of your software. In turn, this makes it easier for people to tell others about your software. I don't imagine that bioinformatics software developers ever want to overhear the following type of conversation:

Bob: "You should use that tool"

Sue: "What tool?"

Bob: "Umm, you know that PCA thingy. The S…something, something…PCA tool"

Sue: "The what?"

Bob: "Run a Google search for Bioinformatics PCA tools, it's probably the top hit."

Sue: <- facepalm ->

Unpronounceable bioinformatics database names

January 21, 2015 by Keith Bradnam

First a quick reminder that an acronym is something that is meant to be pronounced as an entire word (e.g. NATO, AIDS etc.). Sometimes these end up becoming regular, non-capitalized, words (e.g. radar, laser).

In contrast, an initialism is something where the component letters are read out individually (e.g. BBC, CPU). In bioinformatics, there are also names which are part acronym and part initialism (e.g.GWAS…which I have only every heard pronounced as gee-was).

Most initialisms that we use in everday life tend to be short (2–4 letters) because this makes them easier to read and to pronounce. As you move past 4 letters, you run the risk of making your initialism unprouncible and unmemorable.

So here are some recently published bioinformatics tools with names that are a bit cumbersome to repeat. For each one I include how someone might try to pronounce them. Try repeating these names quickly and for an added test, see how many of these names you can remember 5 minutes after you read this:

5 characters

CeCaFDB: a curated database for the documentation, visualization and comparative analysis of central carbon metabolic flux distributions explored by 13C-fluxomics: cee-car-eff-dee-bee? — this assumes that 'Ce' and 'Ca' are not treated separately as two letters…one could argue that if it is not clear how your bioinformatics tool name should be pronounced, then it does not have a good name.
EHFPI: a database and analysis resource of essential host factors for pathogenic infection: ee-aitch-eff-pee-aye
PAIDB v2.0: exploration and analysis of pathogenicity and resistance islands: pee-ay-aye-dee-bee — this is a particularly bad choice of name as it will read to many as 'paid-bee'
rrnDB: improved tools for interpreting rRNA gene abundance in bacteria and archaea and a new foundation for future development: ar-ar-en-dee-bee (the first 3 characters are not easy to say quickly!)
The TTSMI database: a catalog of triplex target DNA sites associated with genes and regulatory elements in the human genome: tee-tee-ess-em-aye

6 characters

DBTMEE: a database of transcriptome in mouse early embryos: dee-bee-tee-em-ee-ee — I accept that maybe this one is just pronounced dee-bee-tee-me, but once again do you really want there to be uncertaintly as to how the name of your bioinformatics tool is read by others?
euL1db: the European database of L1HS retrotransposon insertions in humans: ee-you-ell-one-dee-bee
SASBDB, a repository for biological small-angle scattering data: ess-ay-ess-bee-dee-bee
WDSPdb: a database for WD40-repeat proteins: dub-ball-you-dee-ess-pee-dee-bee

7 characters

BCCTBbp: the Breast Cancer Campaign Tissue Bank bioinformatics portal: bee-cee-cee-tee-bee-bee-pee
PFP/ESG: automated protein function prediction servers enhanced with Gene Ontology visualization tool: pee-eff-pee-slash-ee-ess-gee (only 6 characters if you omit the slash I guess)
PHI-DAC: protein homology database through dihedral angle conservation: pee-aitch-aye-dash-dee-ay-cee (shorter if you omit dash and/or pronounce 'DAC' as a word)

And the winner goes to…

BioVLAB-MMIA-NGS: microRNA–mRNA integrated analysis using high-throughput sequencing data: this is a 7-letter initialism that comes after a three syllable (non-standard) word, so to pronounce this you have to say bio-vee-lab-em-em-aye-ay-en-gee-ess!!!

Conclusions

If you want people to actually use your bioinformatics tools, then you should aim to give them names that are memorable and pronounceable.

How would you pronounce the name of this bioinformatics tool?

October 22, 2014 by Keith Bradnam

From the latest issue of Bioinformatics we have a new tool that is an R package for the analysis of GWAS studies. Rather than name the tool, I want you all to first see it exactly as it appears in the journal:

The first character in the name of this software is a character which can often be hard to identify, particularly when certain fonts makes it look like it could be the letters L or I, or even the number 1.

This is not a name that is worthy of a JABBA-award, but it does fall in to my category of posts which I call almost JABBA, for software names that have various other issues. The particular issue in this case is that the name is hard to read and therefore hard to pronounce. I feel that the use of lower-case characters makes it more likely that the reader will attempt to pronounce this as a word, rather than read it as an initialism. E.g. maybe you saw this name and read it as 'Lurgpurr', or 'Ergpurr'.

The reason behind the name is not explained in the article, but when you go to the linked software page, all is revealed:

It's a bit odd that one of the five words that appear in this name ('Gaussian') doesn't get mentioned anywhere in the paper. But more importantly, why did they feel the need for using lower-case characters? 'LRGPR' would have been much easier to read and comprehend than the font-dependent 'lrgpr'.