sending an attachment with Amazon SES using Java

Sometimes data scientists need to write software that sends out automated e-mails, and sometimes those e-mails must carry attachments. Here is how to do so in Java using Amazon’s Simple Email Service (SES), which is an inexpensive outbound email service built on Amazon’s cloud infrastructure. Note: Be sure to log into Amazon SES and verify […]

reporting negative results

Two of my recent posts have reported negative results, meaning that no meaningful effects were found during the investigations. Had these investigations been framed as hypothesis tests, we would have failed to reject the null hypotheses. Sounds boring. However there are good reasons to report these results. The first is that negative results still generate […]

Apache Spark and stock price causality

The Challenge I wanted to compute Granger causality (described below) for each pair of stocks listed in the New York Stock Exchange. Moreover, I wanted to analyze between one and thirty lags for each pair’s comparison. Needless to say, this requires massive computing power. I used Amazon EC2 as the computing platform, but needed a […]

data natives

We hear a lot of marketing yammer about “digital natives”, that is, folks fluent in social media and in particular marketing using social media. Writers who use this term often juxtapose such digital natives against “analog natives”, i.e., individuals who matured or were educated before online social media became such a significant part of our […]

graph database for gene annotation

Lately I’ve been experimenting with graph databases using Neo4j and the Cypher query language. To get a feel for these tools, I created the following gene annotation network. The Cypher commands I used are discussed in this post, followed by a demonstration of querying the database. Creating the Graph Database We are creating the following […]

setting up an Amazon RDS instance on a VPC private subnet

As a scientist, I tend not to think about database security much. However, security is an important concern for the database-driven web applications I write, so I decided to learn more about how to use Amazon EC2 and RDS instances securely. As part of this effort, I created a virtual private cloud (VPC) to hide my […]

test driving the Seven Bridges Genomics bioinformatics platform

I recently examined the Seven Bridges Genomics (SBG) platform, building and running a short-read alignment pipeline. Overall, I am impressed by the software. Here I describe my test of the program and then report on my investigation of how it works. Test Drive The test pipeline I devised consisted of two steps, FastQC analysis of […]

listing an Amazon S3 directory’s contents in Java

After much struggle, I have figured out how to list an Amazon S3 directory’s contents in Java using the AWS SDK. Here is how to do it: First, you need to import the following libraries: Then, in your main function (or elsewhere in your code) you need: Be sure to change the “prefix” variable to […]

chaining map operations in Hadoop

Suppose we have a list of RNA sequences (pictured below), and we want to calculate both the “GC” nucleotide content and the RNA folding energy for each sequence using Hadoop 2.2.0. Furthermore, we want to chain the two operations so that each GC content result is fed to the corresponding sequence’s dG calculation. We also […]