Viewing online figures in an Nucleic Acids Research article…an exercise in frustration

I found a Nucleic Acids Research article that I wanted to read online. The article in question had — like most articles — some figures. Here's how my web browser displays one of the figures:

I've blurred out the surrounding text so as not to risk any copyright issues (and also to let you focus on just the figure). If your eyesight is like mine, you may feel that the three subpanels of this figure are too small to be of much use.

So I clicked the image…

The white rectangle enclosing the figure increases in size from about 250 x 130 pixels to about 460 x 210. If I squint, I can just about make out some of the figure text, but it remains far from clear.

So I clicked the image…

Now I see exactly the same image as before, but it's in a new webpage which has a wider figure legend. The new legend is acually a little bit harder to read than previously as there are more words per line. I'm really not sure what the point of this page is.

So I clicked the image…

Hooray! Now I can finally see the figure at a decent size where all of the figure text is legible. It only took me three clicks to get there. To make sense of the figure I turn to the legend, only to find that there is no legend!

So you can have your figure at a reasonable size without the legend, or you can have the legend but only with a small version of the figure. It is obviously beyond the journals ability to give you both.

Great post by Lex Nederbragt about why we need graph-based representations of genome sequences

In a recent blog post, Lex Nederbragt explains why we all need to be moving to graph-based representations of genome sequences (and getting away from linear representations).

In this post I will provide some background, explain the reasons for moving towards graph-based representations, and indicate some challenges associated with this development.

After listing the many challenges involved in moving towards a graph-based future, he refers to the fact that current efforts have not been widely adopted:

Two file formats to represent the graph been developed: FASTG and GFA. FASTG has limited uptake, only two assembly programs (ALLPATHS_LG and SPAdes) will output in that format. GFA parsing is currently only experimentally in the ABYSS assembler, and [the VG program] is able to output it.

The lack of a widely recognized and supported standard for representing variation inherent in a genome sequence is, in my opinion, a major barrier to moving forward. Almost all bioinformatics software that works with genome sequence data expects a single sequence with no variation. It will require a whole new generation of tools to work with a variant-based format, but tool developers will be reluctant to write new code if there is no clear agreement on what is the new de facto file format.

101 questions with a bioinformatician #25: Alex Bateman

Alex Bateman is in charge of Protein Sequence Resources at the European Bioinformatics Institute (EBI). You might know him from his role in developing such popular bioinformatics tools as PfamRfam, and TreeFam (rumors abound that their planned database of Ox genome resources had to be cancelled because they couldn’t secure the desired name).

A recipient of the Benjamin Franklin Award for Open Access in the Life Sciences, Alex has also been an enthusiastic advocate of Wikipedia as a resource that scientists should utilize more. To borrow a couple of British-isms, Alex is a ‘jolly nice chap’ and a ‘thoroughly decent bloke’, and I say that as someone who once had to stand on a milk crate with him as part of a management training exercise (don’t ask!). I sometimes think that he has found the elixir of life as he never seems to age. Oh, and you should also be aware that he has a black belt in origami.

You can find out more about Alex by visiting his group’s website at the EBI, or by following him on twitter (@alexbateman1). And now, on to the 101 questions…

Read More