test driving the Seven Bridges Genomics bioinformatics platform

I recently examined the Seven Bridges Genomics (SBG) platform, building and running a short-read alignment pipeline. Overall, I am impressed by the software. Here I describe my test of the program and then report on my investigation of how it works.

Test Drive

The test pipeline I devised consisted of two steps, FastQC analysis of the short reads, and Bowtie2 alignment of the reads to a reference transcriptome. I constructed the pipeline visually in the SBG web interface:

pipeline

The software ensures that the inputs to FastQC and Bowtie2 are appropriate, e.g., Bowtie2 requires that a FASTQ file enters its reads port.

After building the pipeline, running it was straightforward. First I specified the input files for the FASTQ and Reference FASTA nodes:

setting_the_input_types

I then reviewed the node settings and executed the pipeline:

review_and_run

The pipeline completed in eight minutes—it was a small reference FASTA—with a total cost of $1.56.

success

The output nodes could then be viewed. For instance, the FastQC output showed as embedded HTML:

FastQC_report

The sequence alignment results were visible in a text file:

aligned_reads

How it Works

Amazon Web Services wrote a case study on SBG’s use of their services, which is available at http://aws.amazon.com/solutions/case-studies/seven-bridges-genomics/. I am drawing from that information source here. SBG uses EC2 reserved instances. I suspect that when a pipeline is initiated, SBG’s software evaluates the expected computational demand and then selects the appropriate type and number of EC2 instances for the job. This enables them to control costs. SBG uses S3 to store the data used by and produced by the pipelines, but uses Elastic Block Store (Amazon EBS) during pipeline execution to facilitate IO efficiency.

Post Author: badassdatascience

Leave a Reply

Your email address will not be published.