Data access for the 1,000 Plants (1KP) project

From the abstract of a new paper in GigaScience:

The 1,000 plants (1KP) project is an international multi-disciplinary consortium that has generated transcriptome data from over 1,000 plant species, with exemplars for all of the major lineages across the Viridiplantae (green plants) clade. Here, we describe how to access the data used in a phylogenomics analysis of the first 85 species, and how to visualize our gene and species trees.

The paper doesn't provide a link to what seems to be the actual project website. They mention directories within the iPlant Collaborative project where you can access data. The project website reveals that this project can be referred to either '1000 plants', 'oneKP' or '1KP' (but not '1000P'?).

Being a pedantic kind of guy, I was curious by the paper's vague mention of 'over 1,000 plant species'. How many species exactly? The paper doesn't say. But if you go to one of the iPlant pages for 1KP, you will see this:

Altogether, we sequenced 1320 samples (from 1162 species)

So this project seems to have exceeded the boundaries suggested by its name. How about the '1.2KP' project?

Identical Classifications In Science: Some advice for Jonathan Eisen

Jonathan Eisen — a colleague at the UC Davis Genome Center — has a quandary. He came up with a name for one of his projects but now needs to consider renaming it. The problem is that ICIS (Innovating Communication in Scholarship) sounds a bit like…well you all know what it sounds like. So Jon has appealed for suggestions on how to rename their project.

He should take comfort that he may not be the only one facing this dilemma. After all, the International Cooperative ITP Study Group (ICIS) has been an ongoing collaboration between hematologists since 1997. I wonder whether they are considering a name change? Maybe Jon could also ask the folk at the International Conference on Information Systems (ICIS) who have been meeting since 1980. Or they could talk to the people that came up with the Intelligent Coin Identification System (ICIS), or the The Intensive Care Infection Score (ICIS), or the Integrated Crate Interrogation System (ICIS), or the 20 year old International Crop Information System (ICIS), or the people who named this gene.

These are just some of the academic uses of ICIS that I could find from a couple of quick searches. I expect that there are more out there. This is a reflection on one of the most primal desires of all scientists…the need to come up with an acronym or initialism for their project. This urge is all too commonly associated with the additional need to make the name 'fun' (particularly a desire to name things after animals). Acronyms can also backfire for other reasons, such as when you don't fully appreciate how it might sound in other countries.

The shorter your acronym, the more likely that it has been used by other people before you (even within the same field). My suggestion would be to consider the shocking alternative of not using an acronym at all! After all, sometimes people can come up with new names that seem to catch on.

Making genome assemblies in the year 2014

I often like to encourage students to explain their work without using any complex scientific vocabulary. If you can explain what you do to your parents or grand-parents then this is great practice for explaining your work to other scientists from outside your field.

I also encourage students to think of analogies and metaphors for their work as these can really help others to grasp difficult concepts. Yesterday, I wrote a post called Making cakes in the year 2014 which was (hopefully) an obvious attempt to explain some of the complexities and problems inherent in the field of genome assembly.

It almost feels wrong to even attempt to convert millions of ~100 bp DNA fragments into — in the case of some species — a small number of sequences that span billions of bp. Every single step in the process is fraught with errors and difficulties. Every single step is controlled by software with numerous options that are often unexplored. Every single step has many alternative pieces of software available.

If we just focus on one of the earliest steps in any modern sequencing pipeline, the need to remove adapter contamination from your sequenced reads. There are at least thirty-four different tools that can be used for this step and there are over 240 different threads on SEQanswers.com that contain the words 'trim' and 'adapter' (suggesting that this process is not straightforward, and that many people need help).

I had a look at some of these tools. The program Btrim has 12 different command-line options that can all affect how the program trims adapter sequences (it has 27 different command-line options in total). Skewer has 9 different command-line options that will affect the output of the program. The trimmer Concerti has 8 options that will also affect the output. Do we even have a good idea of what is the best way to remove adapter sequences? Maybe we need a 'trimmathon' to help test all of these tools! 

If there is a point to this post maybe it would be that genome assembly is an amazingly complex, time consuming, and fundamentally difficult problem. But even the 'little steps' that that have to be done before you even start assembling your sequences are also far from straightforward. Don't convince yourself for a moment that a single tool — with default parameters — will do all of the hard work for you.

 

 

PLOS Computational Biology: Ten Simple Rules for Writing a PLOS Ten Simple Rules Article

Is there practical advice for contributing to the Ten Simple Rules collection already available? What can we learn from the existing articles in the collection? If only there was an article with ten simple rules for writing a PLOS Ten Simple Rules article. If only that article could be peppered with insightful comments from the founder of the collection: Philip E. Bourne.

This is that article.

This is very meta. I think I will wait for the 'Ten Simple Rules for Writing a Ten Simple Rules Article about writing a PLOS Ten Simple Rules Article'.