sending an attachment with Amazon SES using Java

Sometimes data scientists need to write software that sends out automated e-mails, and sometimes those e-mails must carry attachments. Here is how to do so in Java using Amazon’s Simple Email Service (SES), which is an inexpensive outbound email service built on Amazon’s cloud infrastructure. Note: Be sure to log into Amazon SES and verify […]

EC2 spot instance price change: no correlation with day of week

My plans for world domination involve heavy use of Amazon EC2 instances, but I have to be frugal about it so I’m running spot instances to save cash. Therefore a means of forecasting spot instance prices would be helpful. Thus far I’ve had little success using mainstream forecasting tools such as ARIMA and exponential smoothing. […]

setting up an Amazon RDS instance on a VPC private subnet

As a scientist, I tend not to think about database security much. However, security is an important concern for the database-driven web applications I write, so I decided to learn more about how to use Amazon EC2 and RDS instances securely. As part of this effort, I created a virtual private cloud (VPC) to hide my […]

test driving the Kepler scientific workflow system

The Kepler scientific workflow system enables scientists and engineers to specify their software pipelines as chains of visual dependencies. Each node in a pipeline runs a specific task, and it does not matter what programming language the task is written in since Kepler only manages the inputs and outputs of each step. Here I describe […]

test driving the Seven Bridges Genomics bioinformatics platform

I recently examined the Seven Bridges Genomics (SBG) platform, building and running a short-read alignment pipeline. Overall, I am impressed by the software. Here I describe my test of the program and then report on my investigation of how it works. Test Drive The test pipeline I devised consisted of two steps, FastQC analysis of […]

command line Hadoop with a “live” Elastic MapReduce cluster

There are two ways to run Hadoop from the command line on an Elastic MapReduce (EMR) cluster that is active in “waiting” mode. First the hard way: Running Hadoop Directly by Logging into the Cluster’s Head Node The following commands show how you can log into the cluster’s head node and run Hadoop from the […]

listing an Amazon S3 directory’s contents in Java

After much struggle, I have figured out how to list an Amazon S3 directory’s contents in Java using the AWS SDK. Here is how to do it: First, you need to import the following libraries: Then, in your main function (or elsewhere in your code) you need: Be sure to change the “prefix” variable to […]

test driving Amazon Web Services’ Elastic MapReduce

Hadoop provides software infrastructure for running MapReduce tasks, but it requires substantial setup time and availability of a compute cluster to take full advantage of. Amazon’s Elastic MapReduce (EMR) solves these problems; delivering pre-configured Hadoop virtual machines running on the cloud for only the time they are required, and billing only for the computation minutes […]