Slides: Thoughts on the feasibility of Assemblathon 3

The slides below represent the draft assembly version of the talk that Ian Korf will be giving today at the Genome 10K meeting. I.e. these are slides that I made for him to use as the basis of his talk. I expect his final version will differ somewhat.

After I made these slides I discovered that two of the species that I listed as potential candidates for Assemblathon 3 already have genome projects. The tuatara genome project is actually the subject of another talk at the Genome 10K meeting, and a colleague tells me that there is also a California condor genome project too.

Thoughts on a possible Assemblathon 3

Lex Nederbragt has written a post outlining his thoughts on what any Assemblathon 3 contest should look like. This is something that Ian Korf will be talking about today at the Genome 10K meeting which is happening at the moment (though it seems that there has been a lot of discussion about this in other sessions). From his post:

I believe it is here the Assemblathon 3 could make a contribution. By switching the focus from the assembly developers to the assembly users, Assemblathon 3 could help to answer the question:

How to choose the ‘right’ assembly from a set of generated assemblies

Trying to download the cow genome (again): where's the beef (again)?

Almost a year ago, I blogged about my frustrations regarding the extremely confusing nature of the cow genome and the many genome assemblies that are out there. Much of that frustration was due to websites and FTP sites that had broken links, misleading information, and woefully incomplete documentation.

One year on and I hear a rumor that a new version of the cow genome is available. So I went off in search of 'UMD 3.1.1'. My first stop was bovinegenome.org which is one place where you can find the previous 'UMD 3.1' assembly. But alas, they do not list UMD 3.1.1.

After some Google searching I managed to find this information at the UCSC Genome Bioinformatics news archive:

We are pleased to announce the release of a Genome Browser for the June 2014 assembly of cow, Bos taurus (BostaurusUMD 3.1.1, UCSC version bosTau8). This updated cow assembly was provided by the UMD Center for Bioinformatics and Computational Biology (CBCB). This assembly is an update to the previous UMD 3.1 (bosTau6) assembly. UMD 3.1 contained 138 unlocalized contigs that were found to be contaminants. These have been suppressed in UMD 3.1.1.

This reveals that the update is pretty minor (removal of contaminant contigs which were never part of any chromosome sequence anyway). In any case, the USCC FTP site contains the UMD 3.1.1 assembly so that's great.

But out of curiosity I followed UCSC's link to the UMD Center for Bioinformatics and Computational Biology (CBCB) website. The home page doesn't make it easy to find the cow genome data. Searching the site for 'UMD 3.1.1' didn't help but searching for 'cow genome' did take me to their Assembly data page which lists the cow genome. Unfortunately the link for the Bos taurus genome takes you to 'page not found'. In contrast, the 'data download' link does work and takes you to their FTP site which fails to include the new assembly (but it does list all of the older cow genome assemblies).

Plus ça change, plus c'est la même chose.

Community annotation — by any name — still isn’t a part of the research process. It should be

In order for community annotation efforts to succeed, they need to become part of the established research process: mine annotations, generate hypotheses, do experiments, write manuscripts, submit annotations. Rinse and repeat.

A thoughtful post by Todd Harris on his blog which lists some suggestions for how to fix the failure of community annotation projects.

I particularly like Todd's 3rd suggestion:

We need to recognize the efforts of people who do [community annotation]. This system must have professional currency to it, akin to writing a review paper, and should be citable…