I recently examined the Seven Bridges Genomics (SBG) platform, building and running a short-read alignment pipeline. Overall, I am impressed by the software. Here I describe my test of the program and then report on my investigation of how it works.
The test pipeline I devised consisted of two steps, FastQC analysis of the short reads, and Bowtie2 alignment of the reads to a reference transcriptome. I constructed the pipeline visually in the SBG web interface:
The software ensures that the inputs to FastQC and Bowtie2 are appropriate, e.g., Bowtie2 requires that a FASTQ file enters its reads port.
After building the pipeline, running it was straightforward. First I specified the input files for the FASTQ and Reference FASTA nodes:
I then reviewed the node settings and executed the pipeline:
The pipeline completed in eight minutes—it was a small reference FASTA—with a total cost of $1.56.
The output nodes could then be viewed. For instance, the FastQC output showed as embedded HTML:
The sequence alignment results were visible in a text file:
How it Works
Amazon Web Services wrote a case study on SBG’s use of their services, which is available at http://aws.amazon.com/solutions/case-studies/seven-bridges-genomics/. I am drawing from that information source here. SBG uses EC2 reserved instances. I suspect that when a pipeline is initiated, SBG’s software evaluates the expected computational demand and then selects the appropriate type and number of EC2 instances for the job. This enables them to control costs. SBG uses S3 to store the data used by and produced by the pipelines, but uses Elastic Block Store (Amazon EBS) during pipeline execution to facilitate IO efficiency.