BioDocker and BioBoxes: the containerization of bioinformatics

Thanks to a post on the BioCode's Notes blog I have discovered that there is a project called BioDocker which aims to generate lots of Docker containers to help make bioinformatics more reproducible by standardizing how bioinformatics software is packaged. From the BioDocker website:

The main purpose of this project is to spread the use of Docker on the Bioinformatics and Computational Biology areas. By using pre-configured containers with different bioinformatic softwares some critical aspects of Bioinformatics like reproducibility are minimized. Here you will find a list of containers with different bioinformatics software and how to use it.

BioDocker was created by Felipe da Veiga Leprevost in 2014, and the associated GitHub repository currently has a dozen or so containers.

When I was first read about BioDocker I was confused because I know that there is also the Bioboxes project which aims to er…make bioinformatics more reproducible by standardizing how bioinformatics software is packaged. From the Bioboxes manifesto:

Software has proliferated in bioinformatics and so have the problems associated with it: missing or unobtainable code, difficult to install dependencies, unreproducible workflows, all with terrible user experiences. We believe a community standard, using software containers, has the opportunity to solve these problems and increase the standard of scientific software as a whole.

I think the aims of these two projects are similar, but not identical and Bioboxes probably has a broader remit. Both projects are aware of each other and it looks like they have had some productive exchanges.

All of this makes me feel that the bioinformatics community seems to be slowly, but steadily, embracing Docker. Any approaches to standardize how we do bioinformatics should be welcomed, but some of us with long memories will recall that we have been in this situation before. Anyone remember the promises of how CORBA and then SOAP were going to increase interoperability in bioinformatics?

The name of this bioinformatics tool merits close inspection

  1. Bogus bioinformatics acronyms = mildly annoying
  2. Names that clash with previouly published tools = mildly annoying
  3. Bogus bioinformatics acronyms that clash with previouly published tools = very annoying

Step forward a new paper published in journal of Bioinformatics:

How does INSPEcT derive its name?

  • INSPEcT (INference of Synthesis, Processing and dEgradation rates in Time-course analysis)

Inclusion of the 'E' from 'degradation' and omission of 'R', 'C', or 'A' (from 'Rates', 'Course', and 'Analysis') earns this tool a JABBA award. It also earns a 'Duplications' award because of:

Bioinformatics is just like bench science and should be treated as such

A great post by Richard Edwards on his Cabbages of Doom blog, which includes a list of 8 shocking ways that bioinformatics is just like bench science. Highly recommended reading. His conclusion bears repeating here:

Bioinformatics is science. Full stop. It is no better than other science. It is no worse than other science. People do it right. People do it wrong.