Is Amazon's new 'unlimited' cloud drive suitable for bioinformatics?
Amazon have revealed new plans for their cloud drive service. Impressively, their 'Unlimited Everything' plan offers the chance to store an unlimited number of files and documents for just $59.99 per year (after a 3-month free trial no less).
News of this new unlimited storage service caught the attention of more than one bioinformatician:
@BioMickWatson finally a good place to store all the 1KG data
— David Mittelman (@evolvability) March 27, 2015
If you didn't know, bioinformatics research can generate a lot of data. It is not uncommon to see individual files of DNA sequences stored in the FASTQ format reach 15–20 GB in size (and this is just plain text). Such files are nearly always processed to remove errors and contamination resulting in slightly smaller versions of each file. These processed files are often mapped to a genome or transcriptome which generates even more output files. The new output files in turn may be processed with other software tools leading to yet more output files. The raw input data should always be kept in case experiments need to be re-run with different settings so a typical bioinformatics pipeline may end up generating terabytes of data. Compression can help, but the typical research group will always be generating more and more data, which usually means a constant struggle to store (and backup) everything.
So could Amazon's new unlimited storage offer a way of dealing with the common file-management headache which plagues bioinformaticians (and their sys admins)? Well probably not. Their Terms of Use contain an important section (emphasis mine):
3.2 Usage Restrictions and Limits. The Service is offered in the United States. We may restrict access from other locations. There may be limits on the types of content you can store and share using the Service, such as file types we don't support, and on the number or type of devices you can use to access the Service. We may impose other restrictions on use of the Service.
You may be able to get away with using this service to store large amounts of bioinformatics data, but I don't think Amazon are intending for it to be used by anyone in this manner. So it wouldn't surprise me if Amazon quietly started imposing restrictions on certain file types or slowing bandwidth for heavy users such that it would make it impractical to rely on for day-to-day usage.