examining mRNA complexity by annotation region using MapReduce

I became interested in how annotated mRNA regions (e.g., 5′ UTR, coding, and 3′ UTR) vary in information content, speculating that coding regions (CDS) of transcripts will be generally more complex than other regions due to their role in specifying protein recipes. Measuring sequence complexity using Shannon entropy validated this hypothesis, at least with regard […]

command line Hadoop with a “live” Elastic MapReduce cluster

There are two ways to run Hadoop from the command line on an Elastic MapReduce (EMR) cluster that is active in “waiting” mode. First the hard way: Running Hadoop Directly by Logging into the Cluster’s Head Node The following commands show how you can log into the cluster’s head node and run Hadoop from the […]

chaining map operations in Hadoop

Suppose we have a list of RNA sequences (pictured below), and we want to calculate both the “GC” nucleotide content and the RNA folding energy for each sequence using Hadoop 2.2.0. Furthermore, we want to chain the two operations so that each GC content result is fed to the corresponding sequence’s dG calculation. We also […]

test driving Amazon Web Services’ Elastic MapReduce

Hadoop provides software infrastructure for running MapReduce tasks, but it requires substantial setup time and availability of a compute cluster to take full advantage of. Amazon’s Elastic MapReduce (EMR) solves these problems; delivering pre-configured Hadoop virtual machines running on the cloud for only the time they are required, and billing only for the computation minutes […]

using Hadoop to examine county-level industrial diversity

In a previous post, I computed each U.S. county’s industrial diversity from the 2009 County Business Patterns data published by the U.S. Census Bureau. The diversity calculation made use of Shannon’s information entropy equation, which is similarly used by ecologists to calculate species diversity for a region. Here I perform the same calculation using Hadoop, […]