Updated version of my 'Genome assembly: then and now' talk is now available

This is a presentation that I have probably given five times now. Originally, the main focus of the talk was purely about the Assemblathon 2 paper, with some thoughts about how the field of genome assembly has changed since the days of Sanger-only sequencing.

Over time, I've increasingly downplayed the Assemblathon 2 content of the talk, and made way for updates relating to the latest developments in genome sequencing and assembly. To that end, I've decided to start adding version numbers to this talk to help make it easier to distinguish between different versions.

So here is version 1.2 of my talk, presented below with and without notes (my talks are very visual, so I have embedded notes to try to capture what I talk about for each slide). Don't be put off by the high slide count (many of these just reflect animated steps).

Without notes…

With notes (probably need to go full-screen to be able to clearly read these)…

Too many genome assemblers to keep track of? Nucleotid.es to the rescue!

Yesterday, I presented an updated version of my 'Genome Assembly: Then and Now' talk. I'll try to post the full set of slides (with notes) later today on Slideshare. But I thought I'd share just one of the new slides from the talk; here are six papers that describe new genome assembly tools…

Read More

New option to subscribe to this blog via email

Shamelessly borrowing this idea from Matt Gemmell's excellent blog, I thought I'd offer the chance to subscribe to my infrequent ramblings via email. If you enter your email address below, you can receive a weekly email (sent on Friday afternoons) with all of my posts for that week.

Your email address will only be used for the purpose of receiving my blog content and will not be shared with anyone else. Each email will offer a simple link by which to unsubscribe.

Some sage advice on avoiding confusing names for bioinformatics tools

SAGE is a molecular technique used to investigate the mRNA population from a chosen sample. It stands for Serial Analysis of Gene Expression and was first described back in 1995. The technique spawned spin-offs such as LongSAGE, RL-SAGE (Really Long SAGE), and SuperSAGE.

Although this technique has largely been superseded by other methods (such as RNA-Seq), it is still widely referenced (over 1,300 publications from 2013 mention this technique).

Fast-forward to the present day and I note that a new tool has just been published in the journal BMC Bioinformatics:

SAGE: String-overlap Assembly of GEnomes

As long as you query your favorite web search engine for some combination of 'SAGE' and 'genome assembly' you will probably find this tool and not end up on one of the half a million pages that talk about the other SAGE. I'm still not sure whether it is a bit risky giving a new tool the same name as such an established molecular technique.

All of this means that there is the potential for a certain company to use the aforementioned molecular technique to help annotate the output of the aforementioned computational technique, and apply both of these techniques to data from a certain plant. This could give you the world's first SAGE, SAGE, SAGE, sage genome!