Is this acceptable behavior for a bioinformatics program developed in the year 2014?

Last week I installed a relatively new read aligner with the humorous name of ARYANA:

The journal article describing the tool was published on September 10th 2014, and the associated code repository on GitHub first appeared earlier in the same year. So we're not talking about an old program here.

If I have time I'm planning to investigate the use of ARYANA alongside other established mapping tools like BWA and Bowtie 2. Installing ARYANA was straightforward, so then I proceeded to try the first thing that I attempt with all new bioinformatics software (and most Unix command-line software):

Run the program without any parameters to see what happens

I don't think I'm alone in this approach. In the absence of any necessary command-line options, a good Unix program will return helpful information about how it should be used. At the very least it might prompt you with the minimal use scenario and/or point out how you can find out more information by invoking the help mode. So here is what happened with ARYANA:

% aryana
Need more inputs

Not very helpful. So I tried the next obvious thing, let's see if there is a help mode:

% aryana -h
Need more inputs

% aryana --help
Need more inputs

Hmm. This is really not helpful. Out of curiosity, I tried to see if ARYANA would tell me what version it is (a fairly common behavior for a lot of command-line software):

% aryana -v
Need more inputs

% aryana --version
Need more inputs

At this point I sighed. Not figuratively. I literally sighed, because this type of feedback from a program — especially a bioinformatics program developed in the year 2014 — is maddening. I tweeted about this issue and judging by the feedback, I am not alone with my views on this.

It may have been less frustrating to return no output at all rather than return just those three words. I feel like the program is taunting me. It may as well have returned any of the following output:

% aryana
Not gonna work

% aryana
No can do

% aryana
Please go away

I could use this blog post to tell you about some of the basic requirements of a bioinformatics command-line program, but I don't need to do this because others have already done so. Specifically, people should look at this great paper by Torsten Seemann (@torstenseemann), published in GigaScience last year:

Ten recommendations for creating usable bioinformatics command line software

This is a fantastic set of recommendations, and coincidentally the first three things on the list relate to the first three things that I tried doing when running the ARYANA program:

  1. Print something if no parameters are supplied
  2. Always have a “-h” or “--help” switch
  3. Have a “-v” or “--version” switch

This is good advice of developers of bioinformatics software, but equally it is good advice for reviewers of bioinformatics software. If I was a reviewer of the ARYANA paper, I would have made comments regarding the lack of useful output from the program.

101 questions with a bioinformatician #18: Richard Emes

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.


Richard Emes is an Associate Professor and Reader in Bioinformatics at The University of Nottingham (where they let in lots of riffraff). He is also the Director of the University's shiny, new Advanced Data Analysis Centre (ADAC).

His research interests include the comparative genomics and epigenomics of (mostly) animal species to understand health and disease, and in his role as Director of ADAC, he is forging collaborations that help others with their informatics needs across the university and further afield. Most importantly, he and his team know how to come up with a decidedly non-bogus acronym for a piece of bioinformatics software.

You can find out more about Richard by visiting his lab's website/blog, or by following him on twitter (@rdemes). And now, on to the 101 questions...

 

001. What's something that you enjoy about current bioinformatics research?

I love the variation of ideas. I could never have followed a career of working on a single gene, protein, or disorder.  Bioinformatics lets you think in a slightly less reductionist way. Letting the data drive discovery can be exciting and rewarding

 

010. What's something that you *don't* enjoy about current  bioinformatics research?

Seeing junior researchers working really hard to clean and analyze a complex dataset to allow visualization that provokes insight, then getting little recognition because, “they made a figure”. Recognition of author contribution is changing, but slowly

 

011. If you could go back in time and visit yourself as an 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

I would say get a deep understanding of statistics and start learning helpful one-liners. The fact that sed -i 's/old/new/g' filename edits a file without you having to open it is mind blowing when you first come to the command-line. 

 

 

100. What's your all-time favorite piece of bioinformatics software, and why?

My first full project in bioinformatics was looking for gene family expansions as part of the Mouse Genome project. All the alignments and editing were done in SeaView and this is still my go to editor. 

 

 

101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality?

Arginine. I was brought up in the West Country of England and my accent becomes more pronounced when presenting. Arginine makes me sound most like a Pirate when I pronounce it “Arrrrrjenine” (KB: 15 years experience as a bioinformatician and Richard doesn't seem to have learnt the difference between nucleotides and amino acids ;-) I will note his answer as an 'R').

THe popUlARiTY of VARioUS iUpAC NUCleoTiDe AMBiGUiTY CHARACTeRS

There have now been 18 interviews in my series of 101 questions with a bioinformatician. The final question in each interview is always:

101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality?

So after 18 interviews we have the statistical possibility of equal representation by all possible nucleotide ambiguity codes. Let's take a look at what the results actually look like:

So N and Y are the most popular choices so far but no love for A, C, G, U, K, M, D,  or H! What's so bad about the letter K? I always thought of K as a distinguished member of the IUPAC ambiguity code community!

If you are sharp-eyed you may notice that there are actually 19 responses shown here. That's because a certain someone claimed two characters in their answer. I'm sure that you will all be glued to the next 18 interviews to see if, and how, these frequencies change. And I will be Keen in my undertaKing to maKe sure that I Keep this blog free from any subtle bias that may influence folK.

10 tips for improving your presentations & speeches

Some fantastic advice here from the Presentation Zen site (which is always worth looking at). Many scientific presentations would be greatly enlivened if presenters took more effort to turn a collection of facts and observations into a story. Tip #4 is something that I frequently mention to students in our lab:

(4) Have a clear theme. 
What is your key message? What is it you REALLY want people to remember? What action do you want them to take? Details are important. Data and evidence and logical flow are important. But we must not lose sight of what is really important and what is not. Often, talks take people down a path of great detail and loads of information, most of which is completely forgotten (if it was ever understood in the first place) after the talk is finished. The more details that you include and the more complex your talk, the more you must be very clear on what it is you want your audience to hear, understand, and remember. If the audience only remembers one thing, what should it be? Write it down and stick it on the wall so it's never out of your sight. 

Sometimes students seem almost surprised by the notion that the audience should be expected to remember something from their talk.