Unpronounceable bioinformatics database names
First a quick reminder that an acronym is something that is meant to be pronounced as an entire word (e.g. NATO, AIDS etc.). Sometimes these end up becoming regular, non-capitalized, words (e.g. radar, laser).
In contrast, an initialism is something where the component letters are read out individually (e.g. BBC, CPU). In bioinformatics, there are also names which are part acronym and part initialism (e.g.GWAS…which I have only every heard pronounced as gee-was).
Most initialisms that we use in everday life tend to be short (2–4 letters) because this makes them easier to read and to pronounce. As you move past 4 letters, you run the risk of making your initialism unprouncible and unmemorable.
So here are some recently published bioinformatics tools with names that are a bit cumbersome to repeat. For each one I include how someone might try to pronounce them. Try repeating these names quickly and for an added test, see how many of these names you can remember 5 minutes after you read this:
5 characters
- CeCaFDB: a curated database for the documentation, visualization and comparative analysis of central carbon metabolic flux distributions explored by 13C-fluxomics: cee-car-eff-dee-bee? — this assumes that 'Ce' and 'Ca' are not treated separately as two letters…one could argue that if it is not clear how your bioinformatics tool name should be pronounced, then it does not have a good name.
- EHFPI: a database and analysis resource of essential host factors for pathogenic infection: ee-aitch-eff-pee-aye
- PAIDB v2.0: exploration and analysis of pathogenicity and resistance islands: pee-ay-aye-dee-bee — this is a particularly bad choice of name as it will read to many as 'paid-bee'
- rrnDB: improved tools for interpreting rRNA gene abundance in bacteria and archaea and a new foundation for future development: ar-ar-en-dee-bee (the first 3 characters are not easy to say quickly!)
- The TTSMI database: a catalog of triplex target DNA sites associated with genes and regulatory elements in the human genome: tee-tee-ess-em-aye
6 characters
- DBTMEE: a database of transcriptome in mouse early embryos: dee-bee-tee-em-ee-ee — I accept that maybe this one is just pronounced dee-bee-tee-me, but once again do you really want there to be uncertaintly as to how the name of your bioinformatics tool is read by others?
- euL1db: the European database of L1HS retrotransposon insertions in humans: ee-you-ell-one-dee-bee
- SASBDB, a repository for biological small-angle scattering data: ess-ay-ess-bee-dee-bee
- WDSPdb: a database for WD40-repeat proteins: dub-ball-you-dee-ess-pee-dee-bee
7 characters
- BCCTBbp: the Breast Cancer Campaign Tissue Bank bioinformatics portal: bee-cee-cee-tee-bee-bee-pee
- PFP/ESG: automated protein function prediction servers enhanced with Gene Ontology visualization tool: pee-eff-pee-slash-ee-ess-gee (only 6 characters if you omit the slash I guess)
- PHI-DAC: protein homology database through dihedral angle conservation: pee-aitch-aye-dash-dee-ay-cee (shorter if you omit dash and/or pronounce 'DAC' as a word)
And the winner goes to…
- BioVLAB-MMIA-NGS: microRNA–mRNA integrated analysis using high-throughput sequencing data: this is a 7-letter initialism that comes after a three syllable (non-standard) word, so to pronounce this you have to say bio-vee-lab-em-em-aye-ay-en-gee-ess!!!
Conclusions
If you want people to actually use your bioinformatics tools, then you should aim to give them names that are memorable and pronounceable.