Limited tickets still available for 'Bio in Docker' Symposium (November 2015)

This free symposium — organised by Kings College London and the Biomedical Research Centre (BRC) — will bring together people interested in using Docker images in the field of bioinformatics, and will include a 'mini-hackday' session.

Event description

Docker is now establishing itself as the de facto solution for containerization across a wide range of domains. The advantages are attractive, from reproducible research to simplifying deployment of complex code.

This event will bring together some notable cases to discuss how advantage of this new technology can best be achieved within the Bioinformatics space.

Event details

  • November 9–10, 2015
  • London, UK at the Wellcome Collection
  • Register through Eventbrite

Talk details

  • Peter Belmann (Bioboxes): i) Evaluating and ranking bioinformatics software using docker containers. ii) Overview of the BioBoxes project
  • Nebojsa Tijanic (SB Genomics): Portable workflow and tool descriptions with Common Workflow Language and Rabix
  • Paolo Di Tommaso (Nextflow / Notredame Lab, CRG): Manage reproducibility in genomics pipelines with Nextflow and Docker containers
  • Amos Folarin & Stephen Newhouse (NGSeasy/KCL): Next generation sequencing pipelines in Docker
  • Tim Hubbard (Genomics England / KCL): Pipelines to analysis data from the 100,000 genomes project as part of the Genomics England Clinical Interpretation Partnership (GeCIP)
  • Fabien Campagne (Campagne lab): MetaR and the Nextflow Workbench: application of Docker and language workbench technology to simplify bioinformatics training and data analysis.
  • Elijah Charles (Intel): Bioinformatics and the packaging melee
  • Brad Chapman (Blue Collar Bioinformatics): Improving support and distribution of validated analysis tools using Docker
  • Michael Ferranti / Kai Davenport (ClusterHQ): Data, Volumes and portability with Flocker
  • Ilya Dmitrichenko (Weave): Application-oriented networking with Weave
  • Aanand Prasad (Docker): Orchestrating Containers with Docker Compose

The most haplotyped place on Earth: 'DNA Land' is open for business!

DNA Land has opened! If you are curious what DNA Land is, well here is the concise description offered by the website:

DNA Land is a place where you can learn more about your genome while enabling scientists to make new genetic discoveries for the benefit of humanity. Our goal is to help members to interpret their data and to enable their contribution to research.

At the time I captured the above screenshot, the site boasted '2,483 genomes and counting'. At the time I started writing this piece it had already risen to '2,501' genomes. Erika Check Hayden gives a good overview of DNA Land in a Nature news item: Scientists hope to attract millions to 'DNA.LAND'.

So DNA Land is a place to learn more about your genome, which aims to attract millions of visitors, and where you can also earn badges. Hmm, makes me wonder whether I should wait for DNA World to open…especially if the lines are long.

'The amount of foil needed to wrap five breakfast sandwiches': a new metric for genomics?

Photo by Robyn Lee for seriouseats.com

The journal Genome Research is celebrating its 20th anniversary and has marked the occasion by issuing a number of 'perspective' articles. One of these — A vision for ubiquitous sequencing — includes one of the strangest comparisons that I've ever seen in the field of genomics (or really any field):

Back in 1990, sequencing 1 million nucleotides cost the equivalent of 15 tons of gold (adjusted to 1990 price). At that time, this amount of material was equivalent to the output of all United States gold mines combined over two weeks. Fast-forwarding to the present, sequencing 1 million nucleotides is equivalent to the value of ∼30 g of aluminum. This is approximately the amount of material needed to wrap five breakfast sandwiches at a New York City food cart.

Most people will understand the point that is being made here. Sequencing used to be really expensive whereas now it is very cheap. But is there really a need to explain what 30 grams of aluminum foil amounts to in a more, human-friendly, unit? And even if such a comparison is deemed necessary, is the use of 'breakfast sandwiches' from New York City food carts the most suitable choice?

Brief thoughts on Karyn Meltz Steinberg's ASHG 2015 talk on genome assembly improvement

I like it when people a) share their slides online and b) share their slides online soon after they give a talk somewhere. This is particularly helpful when want to quickly catch up on developments from a conference that you couldn't attend. Karyn Meltz Sternberg (@KMS_Meltzy on twitter) ticks both boxes because she posted her #ASHG2015 slides almost as soon as her talk finished. The title of her talk was:

Building a platinum human genome assembly from single haplotype human genomes generated from long molecule sequencing

Her slides — hosted on Slideshare — are embedded below.

What interested me from this talk is the use of sequence maps generated by the BioNano Genomics Irys platform to improve genome assemblies. This technology seems to be growing in popularity, offering an easier (and more powerful?) alternative to 'traditional' optical map solutions. This work is part of the McDonnell Genome Institute's Reference Genomes Improvement project, which includes the following — very laudable — aim:

  • We plan to identify and resolve issues (misassemblies, sequence errors, and gaps) within the current reference GRCh38.

I find it interesting that this project has also defined two levels of genome status:

Gold Genome: A high-quality, highly contiguous representation of the genome with haplotype resolution of critical regions.

Platinum Genome: A contiguous, haplotype-resolved representation of the entire genome.

Not clear from these definitions whether platinum genomes can still include short regions of unknown bases (Ns). A figure on the Reference Genomes Improvement project page also hints at a 'Silver' status, making me think it it only a matter of time before we see the addition of a credit-card-esque 'diamond' status level: no unknown bases, with full representation of tandem repeat arrays, e.g. centromeres, and priority booking for VIP tickets at major sporting events.