Still collecting results for my survey about gender bias in bioinformatics

A quick post just to say that although I published some preliminary results from my survey about gender bias in bioinformatics, I left the survey live so that others could still add their responses. So far, I've had 28 more responses on top of the original 370. 

I also tweaked the survey form to allow ex-bioinformaticians to respond (and I asked whether they left bioinformatics as a career because of gender bias). If you haven't done so, please complete the form (embedded below) or available here. I'll try to update the main results on Figshare in a few weeks. Hopefully, with some more results it will be possible to see if there are other notable patterns in the results.

101 questions with a bioinformatician #9: Tuuli Lappalainen

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.


Tuuli Lappalainen is a Group leader at the New York Genome Center, an institution that's so new, that their Illumina HiSeq X Ten is counted as one of their older sequencing machines. In addition to having possibly the coolest logo for a genomics/bioinformatics institute, they also have an impressive set of green credentials. And did I mention that it's in New York, New York? Start spreading the newwwss…

Sorry, I got distracted.

Tuuli is also an assistant professor at the Department of Systems Biology at Columbia University. Her work focuses on using high-throughput sequencing data to study functional genetic variation in human populations. Her website — paraphrasing Dobzhansky — puts it like this:

Nothing in the genome makes sense except in the light of the transcriptome

You can find out more about Tuuli by following her on twitter (@tuuliel) or by checking out her lab's website. Oh, and Tuuli is looking for a talented post-doc to join her lab (she didn't ask me to say that, it's all part of the service). And now, on to the 101 questions...

 

 

001. What's something that you enjoy about current bioinformatics research?

I have very little interest in methods for the sake of methods; for me it's all about understanding biology, and bioinformatics provides fantastic opportunities for that.

 

010. What's something that you *don't* enjoy about current  bioinformatics research?

The working environment that is local when data and analyses are increasingly global is driving me insane. I've done (and still do) a lot of consortium work, where all of us still end up copying large data files to our local servers, and having locally optimized pipelines and scripts that are impossible to transfer to colleagues. I know that many people are trying to solve the problem, and I hope we'll be able to make it happen soon. And then there are the complications of applying and getting access to various datasets. Privacy concerns are important, but does dbGap really need to be so difficult to use? Our open access data set from GEUVADIS (Genetic European Variation in Health and Disease) is a great exception to this.

 

011. If you could go back in time and visit yourself as an 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

Learn more stats, math, proper programming. It's great to see how the younger generations have formal training in so many of the skills that I've had to just pick up the along the way — I'm a biologist by training and proud of it, but in the early 2000's computational biology was still very marginal. 

 

100. What's your all-time favorite piece of bioinformatics software, and why?

My two current favorites are pysam for handling BAM/SAM files — fast, great syntax, and much more versatile than alternatives — and Matrix eQTL for very fast eQTL analysis.

 

101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality?

T for Tuuli!

Is the naming of bioinformatics software getting out of CoNtRol?

There is a new paper in the journal Bioinformatics. This is the title of that paper:

CoNtRol: an open source framework for the analysis of chemical reaction networks

Now people will know that I have no stomach for bogus bioinformatics acronyms and initialisms, so is CoNtRol worthy of a JABBA award? Well I can't give it such an award because CoNtRol is not an acronym or an initialism. At least I don't think it is. 

The abstract describes CoNtRol as a web-based framework for analysis of chemical reaction networks (CRNs). So even though the capitalized letters in CoNtRol give you CNR, maybe it's really all about CRNs???

The CoNtRol website makes things a little more confusing by starting their introduction with the text: CoNtRol (CRN tool) is a web application. Are you now thinking what I'm thinking? Is CoNtRol the world's first bioinformatics software based on an anagram (CoNtRol = CRN tool)? If this isn't the reason, then I can only assume that someone decided to just randomly capitalize various letters in the name.

Whatever the reason for the name, the more practical issue is that these tools can often be hard to find with web search engines. It doesn't show up on the first page of Google results if you search for control bioinformatics web app. Nor does it show up if you search for control chemical network app. There is something to be said for giving software novel names.

 

101 questions with a bioinformatician #8: Nick Loman

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.


Nick Loman is an Independent Research Fellow in the Institute of Microbiology and Infection at the University of Birmingham, UK. You may know Nick for his involvement in producing the only world map of high-throughput sequencers (at least I'm assuming that this is the only map of its kind…I'm too lazy to check). Maybe you know him for the exclusive interview that he managed to secure with some of Oxford Nanopore's head honchos at the 2012 AGBT meeting (the scene of a certain wowser moment in high-throughput sequencing). Or maybe you just know Nick for his epicurean passions.

I like to think of Nick as the Jack of Clubs in the deck of cards that is the bioinformatics blogging community (this works as a metaphor, right?). Actually, on some days he's more like the Ten of Diamonds, but then he goes and writes great pieces like this (co-authored with fellow 101 alumni Mick Watson):

If you are interested in bioinformatics, and if you want to keep up with the latest developments in high-throughput sequencing technology, then you really should be keeping a close eye on people like Nick (though not too close, give the man some privacy!).

You can find out more about Nick by following him on twitter (@pathogenomenick) or keeping up with his excellent blog (Pathogens: Genes and Genomes). And now, on to the 101 questions...

 

 

001. What's something that you enjoy about current bioinformatics research?

I mainly enjoy the daily battles with crashing servers with cryptic memory errors, incompatible software versions, buggy scripts (mine and others) and full hard drives. 

Hah! That was the famous British sarcasm you will have read about.

The obvious answer is that the projects I get involved in are incredibly diverse, and I get to interact with many interesting people, because sequencing and bioinformatics skills are in such demand.

Another thing I enjoy is that I can reach out, via Twitter and blogging, to discuss with all the great computation biologists in the world struggling with the same problems. I have no idea what it must be like to feel isolated and slog away in a windowless laboratory without that kind of communication.

 

010. What's something that you *don't* enjoy about current  bioinformatics research?

I whinge quite a lot on my Twitter feed, but I wish bioinformaticians (including myself) wouldn't spend so much time reinventing the wheel (Keith: it's bioinformatics sin number 1 on this list), and instead try and muck-in together to solve really important problems.

A model of bioinformatics research a bit more like the Linux kernel might work. Imagine an international network of committed bioinformaticians working together. We would achieve great things quickly. But the academic model of recognition is broken for things like this, where everyone needs their own papers to justify their positions.

 

011. If you could go back in time and visit yourself as an 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

I guess I would have got into the details of Bayesian statistics and machine learning earlier. These skills are very useful and I am only picking them up properly just now (I am on a Medical Research Council Training Fellowship).

Probably would have slipped myself a copy of Grays Sports Almanac too.

More prosaic: GNU parallel I discovered way too late and is an essential tool. And screen.

 

100. What's your all-time favorite piece of bioinformatics software, and why?

There's very little you can't get done with BLAST. It has its funny little quirks, but you know where you are with it. 

 

 

101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality?

Well, it would be rather British to suggest T. But I prefer coffee.