Too many genome assemblers to keep track of? Nucleotid.es to the rescue!
Yesterday, I presented an updated version of my 'Genome Assembly: Then and Now' talk (slides here). But I thought I'd share just one of the new slides from the talk; here are six papers that describe new genome assembly tools:
As the slide shows, these papers have pretty much all been published in the last month. I think that people outside the field of genome assembly might be surprised by just how much new software keeps being developed in this field.
It is hard to know whether anyone can really stay on top of so many new papers. It can be hard enough to find time just to read them all, let alone properly test and evaluate the software.
That's why I'm hopeful that the new nucleotid.es website by Michael Barton (@bioinformatics) might really be able to help with this problem. If you didn't know, this website not only provides a catalog of popular genome assemblers, but also tries to benchmark them by using various test datasets.
The stroke of genius behind this ongoing, Assemblathon-like, effort, is to package assembler tools using Docker containers. Using Docker is proving to be a popular approach for making software packages easy to distribute without the bulk associated with distributing Virtual Machines (VMs).
Another great aspect of nucleotide.es is that people can submit assembler images that use slightly different command-line options. So it becomes easier to see the difference of tweaking one setting on resulting metrics like NG50. The site could probably do with some more metrics, and more varied test sets — perhaps reflecting larger genomes, more repeat-rich genomes, and genomes with higher heterozygosity — but I'm sure that it will continue to evolve and improve.
If you want to know more about nucleotide.es, check out Michael's set of slides from a recent talk that he gave. Also, take a look to see which assemblers are currently performing well.
Updated 2014-09-22 to include link to my talk and make a couple of minor edits.