building a web-enabled temperature logger

Not wanting to miss out on the “Internet of Things”, I decided to learn some of its foundational technology, namely microprocessor programming. Actually, I used a Raspberry Pi in this project instead of a classic microprocessor, but the idea is the same. Here I describe building a web-enabled temperature logger, complete with a web application to display its results.

The Challenge

I live in an RV with my cat. When I go to work I have to decide whether to leave the windows open or turn on the air conditioner to keep my cat cool, as there is no thermostat on my air conditioner. I usually decide based on the weather forecast, but really don’t know how hot it gets in the RV during the peak temperature of the day. I needed a data logger that reads the temperature regularly and stores it. Ideally such a data logger would report to a web application, so I can monitor the temperature from work.

The Solution

Raspberry Pi

I first bought a Raspberry Pi B+ computer and configured it to run Raspbian Linux. Then I added a USB WiFi dongle so the device can communicate with the Internet.

DS18B20 Digital Temperature Sensor

Next, I bought a DS18B20 digital temperature probe, and connected it to the Raspberry Pi according to the following schematic, which is slightly modified from that specified by [1]:

schematic

The 4.7 kOhm resistor came with the temperature sensor.

The resulting hardware looks like:

hardware_photo_with_resistor_10

Running the Data Logger and Connecting to the Web

On the Raspberry Pi I run the following Python code, which is slightly modified from that shown in [1]. My modification simply calls a URL containing the temperature reading that is processed by the web application described below. This code sends a reading to the URL every two minutes.

import os
import glob
import time
import urllib2

os.system('modprobe w1-gpio')
os.system('modprobe w1-therm')

base_dir = '/sys/bus/w1/devices/'
device_folder = glob.glob(base_dir + '28*')[0]
device_file = device_folder + '/w1_slave'

def read_temp_raw():
    f = open(device_file, 'r')
    lines = f.readlines()
    f.close()
    return lines

def read_temp():
    lines = read_temp_raw()
    while lines[0].strip()[-3:] != 'YES':
        time.sleep(0.2)
        lines = read_temp_raw()
    equals_pos = lines[1].find('t=')
    if equals_pos != -1:
        temp_string = lines[1][equals_pos + 2:]
        temp_c = float(temp_string) / 1000.
        return temp_c

while True:
    try:
        temp_c = read_temp()
        req = urllib2.urlopen('http://my.url.com/logtemp.php?temp=' + str(temp_c))
        time.sleep(2. * 60.)
    except:
        time.sleep(5.)

Web Application for Storing Temperature Readings

A PHP program receives the temperature reading sent by the Python script as a GET argument. It then places the temperature value with a time stamp into a MySQL database. I chose PHP for this task because my web hosting company makes PHP deployment much easier than Django or JSP deployment:

<html>
 <head>
  <title>Log Temperature</title>
 </head>
 <body>
 <?php 

    $temp = $_GET['temp'];

    $con = mysqli_connect("host", "user", "password", "database");

    // Check connection
    if (mysqli_connect_errno()) {
    echo "Failed to connect to MySQL: " . mysqli_connect_error();
    }

    $sql = "INSERT INTO temperature_log VALUES (now(), $temp)";

    if (!mysqli_query($con,$sql)) {
    die('Error: ' . mysqli_error($con));
    }
    echo "1 record added";

    mysqli_close($con);
?>
 </body>
</html>

Web Application for Displaying Temperature Readings

The web application for viewing the temperature readings displays a run chart and a log. The run chart is implemented in JavaScript with the jqPlot library. The application queries the MySQL database for the last 24 hours’ readings. Again, I used PHP just because it is easy to deploy on my web hosting platform.

temperature_logger_screenshot

The code for this application is:

<html>
 <head>
<link rel="stylesheet" type="text/css" href="viewtemp.css">

<script language="javascript" type="text/javascript" src="jqplot/jquery.min.js"></script>
<script language="javascript" type="text/javascript" src="jqplot/jquery.jqplot.min.js"></script>
<script language="javascript" type="text/javascript" src="jqplot/plugins/jqplot.canvasTextRenderer.min.js"></script>
<script language="javascript"type="text/javascript" src="jqplot/plugins/jqplot.canvasAxisTickRenderer.min.js"></script>
<link rel="stylesheet" type="text/css" href="jqplot/jquery.jqplot.css" />

  <title>View Temperature Log</title>
 </head>
 <body>

<center><h1>Trailer Temperature</h1></center>

<div id="chart"></div>
<br><br>

 <?php 

    $con = mysqli_connect("host", "user", "password", "database");

    // Check connection
    if (mysqli_connect_errno()) {
    echo "Failed to connect to MySQL: " . mysqli_connect_error();
    }

    $sql = "select * from temperature_log tl where tl.time >= DATE_SUB(NOW(), INTERVAL 1 DAY) order by tl.time desc";

    $result = mysqli_query($con, $sql);

?>
<table id='time_temp_table'><thead><tr><th>Time</th><th>Temperature (C)</th><th>Temperature (F)</th></tr></thead><tbody>
<?php

    $temp_array = array();
    $time_array = array();
    while($row = mysqli_fetch_array($result)) {
    $temp_f = round($row['temperature'] * 1.8 + 32.0, 2);
    echo "<tr><td id='time_entry'>" . $row['time'] . "</td><td id='temp_entry'>" . $row['temperature'] . "</td><td>" . $temp_f . "</td></tr>";	
    array_push($temp_array, $temp_f);
    array_push($time_array, $row['time']);
    }   

    $temp_array_reverse = array_reverse($temp_array);
    $time_array_reverse = array_reverse($time_array);
?>
    </tbody></table>
<?php
    mysqli_close($con);
?>

<script type="text/javascript">
<?php
$js_array = json_encode($temp_array_reverse);
echo "var tempArrayAsString = " . $js_array . ";\n";
$js_array = json_encode($time_array_reverse);
echo "var timeArrayAsString = " . $js_array . ";\n";
?>

$(document).ready(function() {

	tempArray = [];
	$.each(tempArrayAsString, function(index, value) {
		tempArray.push(parseFloat(value));
	});

	data = [];
	data.push([0, tempArray[0]]);
	timeDiffList = [0];
	for (var i=1; i<timeArrayAsString.length; i++) {
		d = timeArrayAsString[0].split(' ')[0];
		year = parseInt(d.split('-')[0]);
		month = parseInt(d.split('-')[1]) - 1;
		day = parseInt(d.split('-')[2]);
		t = timeArrayAsString[0].split(' ')[1];
		hour = parseInt(t.split(':')[0]);
		minute = parseInt(t.split(':')[1]);
		second = parseInt(t.split(':')[2]);
		dt0 = new Date(year, month, day, hour, minute, second);

		d = timeArrayAsString[i].split(' ')[0];
		year = parseInt(d.split('-')[0]);
		month = parseInt(d.split('-')[1]) - 1;
		day = parseInt(d.split('-')[2]);
		t = timeArrayAsString[i].split(' ')[1];
		hour = parseInt(t.split(':')[0]);
		minute = parseInt(t.split(':')[1]);
		second = parseInt(t.split(':')[2]);
		dti = new Date(year, month, day, hour, minute, second);

		var timeDiff = (dti - dt0) / (1000. * 60.);
		timeDiffList.push(timeDiff);

		data.push([timeDiff, tempArray[i]]);
	}

	var ticksToUse = [];
	var position_dict = {};
	for (i=0; i<timeDiffList.length; i++) {
		var a = Math.round(timeDiffList[i] / 5) * 5;
		if (a % 120 == 0) {
	 		var label = timeArrayAsString[i];
			 
			if (!position_dict.hasOwnProperty(a)) {
			    ticksToUse.push([timeDiffList[i], label]);
			    position_dict[a] = true;
			}
		}
	}
	label = timeArrayAsString[i-1];
	ticksToUse.push([timeDiffList[i-1], label]);


	$.jqplot('chart',  [data], 
	{
		series: [{showMarker: false, lineWidth: 2}],
		axesDefaults : {
			tickRenderer: $.jqplot.CanvasAxisTickRenderer ,
			tickOptions: {
				angle: -80
			}
		},
		axes: {
			xaxis: {
			label: 'Time (Minutes)',
			ticks: ticksToUse,
			},
			yaxis: {
			label: 'Temperature (Fahrenheit)',
			tickOptions: { angle: 0 }
			},
		},
	});

});
</script>
 </body>
</html>

The CSS for the application is:

#time_temp_table {
    border-collapse: collapse;
    background-color: lightblue;
}

#time_temp_table th {
    border: 1px solid black;
    text-align: center;
    padding: 2px 15px 2px 15px
}

#time_temp_table td {
    border: 1px solid black;
    text-align: center;
    padding: 2px 15px 2px 15px
}

Future Plans

For the web application that displays the recorded temperatures, it would be nice to add a box plot to summarize the results.

Ultimately, I’d like to connect this hardware to my air conditioner so that it will automatically turn on when a set temperature point is reached. I’ll need some high-amp relays for this.

References

1. https://learn.adafruit.com/adafruits-raspberry-pi-lesson-11-ds18b20-temperature-sensing

Related Post

engineer moves into an RV

Posted in engineering, science | Tagged , , , , , , , , , , , , , | Leave a comment

rapidly extracting a subsequence from chromosome sequence data in Java

The Challenge

We have a text file containing the nucleotides of a chromosome, say human chromosome 11, and need to be able to quickly extract a subsequence from the chromosome text given a nucleotide position and number of subsequent nucleotides to include. The problem is that chromosome files are huge, e.g. 135 megabytes for chromosome 11, so we don’t want to use typical string processing tools. An example of the text we are extracting nucleotides from is:

chromosome_snapshot

Note that this file contains no line breaks or header, so that the byte position of a nucleotide corresponds to its position in the genome.

The Solution

Our solution is to use random file access to jump to the desired start nucleotide position in the file and read forward from that position for the required number of bases. Java code that implements this strategy is:

import java.io.*;

public class ReadNucleotidePositions {

    public static void main(String[] args) {

	// user settings
	Integer start = 68081400 - 1;   // base zero
	Integer numberOfCharactersToRead = 200;
	String chromosomeFile = "chromosome_11.txt";

	String output = "";

	try {
	    // open a file for reading
	    RandomAccessFile raf = new RandomAccessFile(chromosomeFile, "r");

	    // seek the start position
	    raf.seek(start);

	    // repeat read operation for the number of times specified
	    for (Integer i=0; i<numberOfCharactersToRead; i++) {

		// this has to be type "int", not type "Integer" for the cast to work
		int someCharInteger = raf.read();

		// cast as character and append to output string
		output += (char) someCharInteger;
	    }
	}
	catch (IOException ex) {
	    ex.printStackTrace();
	}

	System.out.println();
	System.out.println(output);
	System.out.println();
    }
}

Here we start at position 68081400 and subtract one to make the coordinate system base zero. We specify that we want to read the nucleotide at this position followed by the next 199 nucleotides in chromosome 11.

We open a “RandomAccessFile” and use the “seek” method to move the file pointer to the position we intend to start reading from. Then we loop for the number of nucleotides we are reading, using the “read” method at each iteration to extract the character at that position in the chromosome text. The value returned from the “read” method is of type “int”, which we must cast to type “char” before adding it to our String object containing the extracted sequence.

run_results

Finally, we check the extracted sequence against a reference (in this case the UCSC Genome Browser) to ensure our sequence extraction is accurate:

UCSC_genome_browser

Posted in bioinformatics, engineering, science | Tagged , , , , , , , | Leave a comment

invasive species spreads to low Earth orbit

After colonizing six continents and setting up scientific outposts on the seventh, Earth’s major invasive species sent some of its members into orbit. Not long after starting to use its opposable thumbs to cultivate grain, the species built rockets capable of landing individuals on Earth’s moon and delivering its members regularly to a series of low orbit scientific test labs.

The species now has its sights on Mars, which it has not visited yet in person but has landed several probes on. These probes are collecting data necessary to facilitate colonization, such as looking for evidence of readily available water. The invasive species also is looking at the asteroid belt as a source of minerals needed for further expansion.

Known for long term thinking, the invasive species has set up laboratories on Earth to develop advanced propulsion technologies to make space travel faster, thereby facilitating a full-blown invasion of the solar system. Pressure is building for such an outward migration from the planet as the invasive species’ population reaches seven billion, a number challenging the planet’s food, water, and energy resources.

Posted in humor, science | Leave a comment

using bug tracking software to keep track of life’s tasks

I’ve tried a few mobile task tracking apps for my smart phone, but have found none as useful for keeping track of life’s responsibilities as using a web-based software bug tracking program called MantisBT. MantisBT is used by software engineers to log reported bugs, assign them to staff for correction, and record progress toward bug resolution.

mantis_logo

To get started, I first deployed MantisBT (a PHP application) on the web so I can access it from anywhere with an internet connection. For the underlying database, I selected MySQL.

I then logged in and created five “projects”: “Badass Data Science” to log tasks related to this blog, “Bills” to track when payments are due, “School” to track school related deadlines, “Repairs” to log repair tasks for my RV or truck, and “Miscellaneous” to track tasks that don’t fit cleanly into the other four topics. Since it is easy to add projects, I am not limited to only five categories.

all_projects

When a new task comes my way, I select the relevant project from the drop-down menu and then click on the “Report Issue” link. The following screenshot shows adding a task (building a GitHub page for the pyDome geodesic dome designer) in the “Badass Data Science” project:

creating_a_new_issue

We can then click “View Issues” to see all the tasks related to the project:

BDS_all_issues

At a glance we can see indicators of task priority (column “P”) and task severity. The rows are color-coded by status with a color key on the very bottom row. For example, five of the tasks shown in the image above have status “assigned”.

When a task is worked on or completed, one can select the task from the “View Issues” page and change its status and/or add notes.

editing_an_issue

The changing status workflow provides additional opportunity to add notes, for example information related to resolving a task:

resolving_an_issue

We can also filter issues by status, for example hiding all resolved issues:

hide_resolved

 

Posted in data science, engineering | Tagged , , , , , , , | Leave a comment

frugal anarchy

Of all the systems that we seek freedom within and from, none pervades our lives as much as the “econosphere” we inhabit. By “econosphere”, I mean the global network of economic activity whose nodes are individuals and whose edges are trade relationships between individuals.

Even if we had no government, we’d still likely be trading goods and services. Therefore the econosphere may be more significant than the existence of government to those seeking freedom. Anarchists traditionally focus on the elimination of government as the means of increasing freedom. However, I propose that limited reliance on the econosphere is a more comprehensive goal for anarchistic thinking.

There are two paths to individual economic freedom in a free-market economy: The first is to be wealthy enough to afford whatever transactions one wants to make whenever one wants to make them. This is unfortunately out of reach for most people. The second path is voluntary frugality; limiting the transactions one makes to well thought out targets, such that utility and satisfaction of purchases is maximized and very few dollars are spent on things outside those targets.

This strategy of voluntary frugality limits individual reliance on the econosphere by limiting the amount of money that an individual needs to acquire and spend, thereby enhancing their freedom to choose their path in life. I cannot think of a more practical expression of anarchism within the “real world” that we inhabit today.

Related Post

engineer moves into an RV

Posted in Uncategorized | Tagged , , , , , , , | Leave a comment

100th post to badass data science

This marks the 100th post to badass data science. I’ve written about everything from Lady Gaga to computational fluid dynamics, usually with a science or data related spin.

I thought I’d look at my posts analytically rather than simply reminisce. First, here is a tag cloud for the first 99 posts:

tag_cloud_CROPPED

From this tag cloud, I can see that either Python or R is used in many posts, and that most posts cover statistical and data science topics. Engineering is also a frequent tag.

I then produced a graph view using Networkx, where the nodes are tags and the edges are formed by tags that occur in the same post. Displaying this graph as VRML:

3D_distant_view

It is a little hard to see, so here is a closer view:

3D_close_view

In this image one can see that “statistics” is a primary hub. In rotated views of the graph (not shown), Python, R, and data science show similar prominence.

Finally, I computed the frequency of the top occurring tags:

tag_frequencies

From this I see that I wrote about Python more than R, which surprised me. I expected an even split. However this insight matches the fact that I favor using Python rather than R whenever possible, because Python is a full-featured programming language capable of easy string parsing and web deployment. It also looks like the number of engineering posts and the number of science posts is split evenly, which accurately reflects my technical interactions with the world.

The Future

I do not think my posts are “badass” enough for the blog’s title, so I’ll try to up the ante. Maybe I need something involving sharks and tornadoes. The post on claiming squatters’ rights is the closest I’ve come to my goal of “badass data science”.

I actually want to branch out and write more detailed analyses of less technical things, such as policy. Or take highly technical topics, such as synthetic biology, and write about them in a laypersons’ voice.

Most importantly, I’ll just keep writing. I’m bound to hit on something good.

Code

Code used to create the VRML shown above is attached.

process_tags.py

Posted in Uncategorized | Tagged , , | Leave a comment

engineer moves into an RV

I recently moved into a travel trailer to lessen the southern California cost of living (and because I like the idea of portable structures as an answer to housing scarcity). This living arrangement sparks my engineering creativity, which is the motivation for this post. Here I discuss RV living from a mechanical and software engineer’s viewpoint.

The following sections explore renting computing power since I no longer can carry a large desktop, modeling the trailer’s computational fluid dynamics (CFD) and mechanical dynamics, setting up an online registry of good boondocking sites, examining the 12 VDC system and figuring out how to add solar and wind power generation to enable off-grid living, improving the insulation and adding a thermostat to the air conditioner, and learning how to get on without my engineering textbooks (which do not fit in the RV).

Renting Computing Power

I no longer have room for a large desktop computer; instead I’m working with a laptop. This is a severe reduction in computing power, so whenever I’m working with a high computation, memory, or disk load I’ll rent computing power from Amazon’s Web Services. Amazon EC2 instances are inexpensive and easy to set up so this strategy is very practical.

Computational Fluid Dynamics

My truck manual limits the amount of surface area perpendicular to the direction of travel that can be towed, presumably due to drag forces. So I tried to use computational fluid dynamics (CFD) to find out if making the front of a trailer more aerodynamic buys one more surface area. I was unsuccessful at this experiment (other than generating cool pictures, below) since I couldn’t figure out how to get the OpenFOAM CFD package to work with compressible flow. First I ran incompressible CFD on a block-shaped trailer model:

v_block_incompressible_CROPPED

Then I ran incompressible CFD on a trailer model where the front is arced:

v_arc_incompresible_CROPPED

The pressure at the front appears to be slightly less for the arced model, which says that it is more aerodynamic. However, because this modeled flow is incompressible, I could not really say much about whether towing performance would be improved by using a trailer with a curved shape at the front. Nonetheless, I bought a trailer with an angled front assuming it will reduce towing drag.

Online Boondocking Site Registry

I won’t be boondocking (parking somewhere for the night without utility hookup) anytime soon since I have a day job that requires 40 hours/week presence. I’m staying in an RV park where there are showers and a laundry facility. However, I’ve been thinking of the needs of boondockers, and plan to create an online registry of good boondocking sites, if one does not exist.

The web application for such a registry will have fields for site location and ratings that users can fill in. I’ll have it generate maps if possible. For the infrastructure of the web application, I envision writing it in Python using Django, with MySQL as the supporting database. However, it may be better to use a NoSQL database like a document store—I’ll have to investigate this more thoroughly. For hosting the application, I’ll likely use an Amazon EC2 instance.

Part of my motivation for creating such a site is that I can see myself boondocking in the future, if I find myself between jobs.

UPDATE:  After writing this I found three such boondocking registry sites online. Their addresses are http://www.boondocking.org/https://www.boondockerswelcome.com/, and http://freecampsites.net/.

Vehicle Mechanical Dynamics

While towing the trailer, I noticed that whenever a semi-truck passed me, the front of my truck was pulled toward the passing semi, and I had to turn the steering wheel to keep my truck straight. This prompted me to explore the dynamics of towing:

model

In this very simple model, a wind gust from the semi-truck is modeled as force Fair, which rotates the trailer clockwise (about the center of mass which is between the wheels). The force at the hitch pin is also modeled such that it rotates the trailer clockwise when it is positive. For this “back of the envelop” calculation, I ignored the forces caused by the tires meeting the pavement to simplify things—assuming that they slip when lateral forces are applied.

On the tow vehicle, I again ignored the forces at the tires and joined the vehicle to the trailer with the hitch pin forces. I then constructed the above equations of motion and solved for the angular velocity of the trailer. This shows that when the trailer is forced to rotate clockwise by a wind gust from a passing semi, my truck is rotated counter-clockwise toward the passing semi, as observed.

12 VDC

The RV’s lights and fan run on 12 Volts DC, so they can run off the grid, but there are no 12 VDC receptacles inside the unit for running arbitrary 12 Volt appliances. However, I have 12 VDC tools that I would like to be able to run off the grid. I can get around this by removing the fuse box cover and attaching power clips to the DC input terminals for the converter:

DC_power_attachment_25

The problem with this arrangement is that the negative terminal is dangerously close to a positive one, so that if for example my cat bumps a clip, it might cause a short circuit and potentially a fire. I could insulate the terminal with electrical tape but that is too much of a “hack” for my taste. If I start boondocking regularly I’ll wire in real 12 VDC receptacles, and be sure to add fuses to them to prevent high current loads.

I added a second marine battery when I purchased the RV, so the overall amp-hours I can power is double the original amount (the batteries are identical, both new, and wired in parallel). A power converter below the fuse box converts grid 120 VAC to 12 VDC to keep the batteries charged. I check the water level in the batteries weekly to ensure the water is not boiled away during charging.

Solar Power

My RV does not have solar panels installed, which I’ll add if I start boondocking regularly. I currently have a small six Watt panel (pictured below) which works as little more than a trickle charger. Since I have no charge controller I have to manually check the batteries while using it, and have to disconnect it at night to prevent battery discharge through it (unless I add a diode to the circuit, which has the downside of lowering the current supplied to the batteries during the day). This six Watt panel certainly does not produce enough charge during the day to recover a night’s worth of DC power use.

solar_panel_25

For true off-the-grid use I’ll buy two 130 Watt panels along with a charge controller. If I can find a charge controller that feeds power back to the grid when I’m plugged in—even better. (It will have to have an inverter built in to do this). I expect the whole package to cost about 1200 USD overall. I’m not ready to commit those funds right now while I’m living in an RV park and have easy grid access, but plan to research the equipment needed now so that I’m ready to buy in case my living arrangements change.

It may be best to design and build a charge controller from scratch, given that I may want to mix solar, grid, and wind power (below) in one system, and I’m not sure if existing controllers can accommodate all three very well. Designing such a controller would be a fun project, but would take a substantial amount of time. If I do this I’ll use an Arduino or Raspberry Pi platform for computing, and use optical relays to separate the computing circuits from the power delivery circuits.

Wind Power

I have never seen an RV with a wind generator installed, but see no reason why a small turbine such as the Primus Air 30 (pictured) cannot be used.

primus-windpower-air-30-turbine-1-ar30-10-48-v

Depending on where I’m staying, I’ll perhaps want a marine-grade turbine. However, my RV is made of aluminum, which is not marine grade, so staying near the coast may be a bad idea no matter what turbine I buy.

I envision mounting it on the bumper and the upper rear of the trailer, as shown in the cheesy image below, with attachments that allow easy removal for when the RV is in motion.

wind_power_image

Insulation

Living in San Diego County requires little use of air conditioning or heating, but while traveling from Texas to San Diego things got hot in the trailer. The “R value” for the wall, floor, ceiling is R-7, which leaves much to be desired. The next time I take a trip across the desert I’ll cover the trailer’s windows with reflective bubble insulation while driving. My thinking is that these window covers can be held on with Velcro for easy removal.

Air Conditioner Thermostat

My trailer’s air conditioner has no thermostat to stop it from running once a desired temperature is reached. This causes problems in the middle of the night when things get cold but I don’t want to get up to turn the machine off. Furthermore, continuing to blow past the point of comfort wastes electricity. To deal with this matter I plan to splice in a household thermostat as soon as I measure the voltage at the AC’s on/off switch, assuming I can make the voltages match. If a household thermostat proves unsuitable, I’ll design a controller from scratch, again using an Arduino or Raspberry Pi system.

Relying on Wikipedia and the Web for Engineering Knowledge

To fit into the RV, I had to get rid of all my engineering and science textbooks. Now I’m relying on the internet whenever I need to look something up.

Related Posts

selecting travel trailers by regression

designing a battery array to power a CPAP machine

Posted in engineering | Tagged , , | 4 Comments

graph database for gene annotation

Lately I’ve been experimenting with graph databases using Neo4j and the Cypher query language. To get a feel for these tools, I created the following gene annotation network. The Cypher commands I used are discussed in this post, followed by a demonstration of querying the database.

Creating the Graph Database

We are creating the following graph:

gd08

Here genes 5663 (PSEN1) and 675 (BRCA2) connect with the species Homo sapiens and also connect to the gene alias FAD. (Both genes have the same gene alias). RefSeq “NM” transcripts connect to their respective genes.

In Cypher, we first create two nodes representing genes PSEN1 and BRCA2, along with a node representing the human species:

CREATE (gid5663:Gene {symbol: "PSEN1", id: 5663, full_name: "presenilin 1"})
CREATE (gid675:Gene {symbol: "BRCA2", id: 675, full_name: "breast cancer 2, early onset"})
CREATE (human:Species {taxonomy_id: 9606, id: "Homo Sapiens"})

The Neo4j console replies that three nodes were created.

gd01

We then link each of the gene nodes we created to the species node. These steps require selecting the nodes to connect with a MATCH query and then feeding the selection results into a CREATE clause.

MATCH (a:Gene), (b:Species)
WHERE a.id = 5663 AND b.taxonomy_id = 9606
CREATE (a)-[r:SPECIES]->(b)
RETURN r

MATCH (a:Gene), (b:Species)
WHERE a.id = 675 AND b.taxonomy_id = 9606
CREATE (a)-[r:SPECIES]->(b)
RETURN r

The console shows each MATCH/CREATE query result:

gd02

We then create a gene alias node, which the gene nodes will connect to:

CREATE (FAD:GeneSymbolAlias {id: "FAD"})

gd03

Like before with the species node, we now connect the genes to the gene alias node:

MATCH (a:Gene), (b:GeneSymbolAlias)
WHERE a.id = 5663 AND b.id="FAD"
CREATE (a)-[r:ALIAS]->(b)
RETURN r

MATCH (a:Gene), (b:GeneSymbolAlias)
WHERE a.id = 675 AND b.id="FAD"
CREATE (a)-[r:ALIAS]->(b)
RETURN r

The console again shows each MATCH/CREATE query result:

gd04

We next create three nodes to represent the three RefSeq transcripts associated with our two genes:

CREATE (NM_000059:RefSeqTranscript {id: "NM_000059", version: 3})
CREATE (NM_000021:RefSeqTranscript {id: "NM_000021", version: 3})
CREATE (NM_007318:RefSeqTranscript {id: "NM_007318", version: 2})

gd05

Finally, we connect these transcripts to their respective genes:

MATCH (a:Gene), (b:RefSeqTranscript)
WHERE a.id = 675 AND b.id="NM_000059"
CREATE (a)-[r:TRANSCRIPT]->(b)
RETURN r

MATCH (a:Gene), (b:RefSeqTranscript)
WHERE a.id = 5663 AND b.id="NM_000021"
CREATE (a)-[r:TRANSCRIPT]->(b)
RETURN r

MATCH (a:Gene), (b:RefSeqTranscript)
WHERE a.id = 5663 AND b.id="NM_007318"
CREATE (a)-[r:TRANSCRIPT]->(b)
RETURN r

gd06

We can view the whole network with:

MATCH (n)-[r]-()
RETURN n, r

gd07

Queries

We can start with a gene and trace its transcripts, returning the transcript nodes:

MATCH (:Gene {id: 5663})-[:TRANSCRIPT]->(t) return t

09

Similarly, we can start with a transcript and trace through its gene to the transcript’s species:

MATCH (:RefSeqTranscript {id: "NM_000021"})-[:TRANSCRIPT]-(gene)-[:SPECIES]-(species) return species

10

Posted in big data, bioinformatics, data science, science | Tagged , , , , , , , , , , , , , , | Leave a comment

setting up an Amazon RDS instance on a VPC private subnet

As a scientist, I tend not to think about database security much. However, security is an important concern for the database-driven web applications I write, so I decided to learn more about how to use Amazon EC2 and RDS instances securely. As part of this effort, I created a virtual private cloud (VPC) to hide my Amazon RDS database from the public internet. To make this easier for others who find themselves in the same position, I’m posting instructions here on how to set up an RDS instance in a VPC using the Amazon AWS console.

Architecture

schematic

We are creating a VPC that is divided into two subnets, one public and one private. EC2 instances on the public subnet can access the internet (and serve web applications), while EC2 and RDS instances on the private subnet can only communicate with instances inside the VPC. We will store our database on the private subnet and access it through an EC2 instance on the public subnet.

Creating the VPC

Start with the AWS Console and select “VPC”:

VPC_and_database_001

Press the “Start VPC Wizard” button:

VPC_and_database_002

We want a VPC with both public and private subnets. Our database will reside on the private subnet while our web application that communicates with the database will go on the public subnet:

VPC_and_database_003

Enter a name for the VPC, as well as names for the public and private subnet. I recommend numbering your private subnet as shown in the image below, since we will need to create another private subnet later:

VPC_and_database_004

Click the “Create VPC” button, and you should receive a page saying the VPC was successfully created.

VPC_and_database_005

Click on “Your VPCs” on the left and find your new VPC. Record the VPC ID number.

Adding a Subnet

To run an RDS instance in the VPC, Amazon requires at least two subnets, each in a different availability zone. Additionally, we want both subnets to be private, so we have to create another subnet to meet the requirements. To do this we first click on the “Subnets” link on the left:

VPC_and_database_006__START_SUBNET

We see the public and private subnets we created with the VPC. Record the VPC numbers shown in the fourth column of each of these two subnets; we’ll need them later. Also record the Subnet IDs for these two subnets which are shown in the second column. Then click on the private subnet row and record the availability zone. Click on “Create Subnet” and add an additional subnet with the dialog box that appears:

VPC_and_database_007__write_down_VPC_number

This step requires specifying a different availability zone than that of the first private subnet, an issue that confused me at first. When you are finished creating the subnet, record the subnet IDs of the two private subnets.

Create a Public-Facing EC2 Instance

Create an EC2 instance in the public subnet of your new VPC. Detailed instructions on how to do this are omitted in this post for brevity. This is just like setting up any EC2 instance, but you have to specify which VPC and subnet to use in the launch wizard. Also, be sure to check the box indicating that you want a public IP address created with the instance. Record the private IP address of this instance as we’ll need it later.

Creating a DB Subnet Group

We now need to create a database subnet group. Go to the AWS Console page and press “RDS”.

VPC_and_database_009__START_DATABASE

Press the “Subnet Groups” link on the left and then press “Create DB Subnet Group”:

VPC_and_database_010

Give your subnet group a name and description, select your VPC (by ID), and add the two private subnets to the group using the “Add” button. Note that the two private subnets have different availability zones associated with them:

VPC_and_database_011

Click on “Yes, Create” and you should get the following output showing your new subnet group:

VPC_and_database_012

Creating an RDS Instance

Click on “Instances” and press the “Launch DB Instance” button:

VPC_and_database_013

Select the database system you want to use. In this case we are using MySQL:

VPC_and_database_014

Select an option for multi-availability zone operation. In this case we are selecting “No”.

VPC_and_database_015

This next screen asks again about use of multi-availability zone operation. Select “No” again. Fill in the DB Instance Class you want to use, the amount of allocated storage you want, a name for the DB Instance, a username, and a password. Note that the name for the DB Instance is not the same as the name for the database in MySQL, which will be specified later:

VPC_and_database_016

In the advanced settings, select the VPC you created earlier, as well as the DB Subnet Group you created. Add a database name (this will be the name of the database inside MySQL):

VPC_and_database_017

You will now see a confirmation page showing the active DB Instance:

VPC_and_database_018

Click on “Instances” to see the details of the RDS instance you created:

VPC_and_database_019

Expand your new DB Instance and click on the security group:

VPC_and_database_020__CLICKED_ON_SECURITY_GROUP

Add an inbound rule for MySQL traffic from your EC2 instance. Use the private address of the EC2 instance you created earlier.

VPC_and_database_021__added_MySQL

That’s it! You’ll now be able to connect to your private RDS instance through your EC2 instance.

 

 

Posted in big data, data science, engineering | Tagged , , , , , , , , , , | Leave a comment

pyDome updates: tangential and spoke angles

In a previous post, I introduced pyDome, a Python program for calculating geodesic dome vertices, chords, and faces. I have since added two hub angle computations to the program, and report on that progress here. Face angle calculations still need to be implemented.

truncated_qcad_focused

Example output in DXF format.

Angles Between Chords and the Hub Tangent Plane

The angle between a chord and the plane tangent to the sphere at the chord’s hub measures the amount of inward deflection a hub spoke for that chord must make. The following diagram illustrates this idea:

tangent_angle_image_CROPPED

In this image, the view is directly facing the side of the tangent plane, so that it appears as a line. Two chords are shown here for illustrative purposes, but there are actually either five or six chords for a hub depending on the hub’s position in the geodesic sphere.

The program now reports these angles for each hub as part of the standard output:

STDOUT_tangent_angles

As expected, these are small angles.

Angles of Chords Around the Hub

The other hub angles considered here are the angles between chords centered around a hub. Here we first project the chords onto the tangent plane, then select one of the chords as a reference, and then report the angle between the projected reference chord and each other projected chord. I call these angles “spoke” angles. The following image illustrates the idea:

spoke_angle_image_CROPPED

Here the view is orthogonal to the sphere’s tangent plane defined at the hub. (This is the same tangent plane as that used above).

The program reports these angles in its standard output:

STDOUT_spoke_angles

Code

pyDome’s source code, which implements these new angle reporting features, is available from GitHub at https://github.com/badassdatascience/pyDome. HTML code for drawing the graphics shown above follows:

<svg width="500" height="500">
  <circle cx="250" cy="250" r="5" stroke="black" fill="black"/>
  <line x1="250" y1="250" x2="210" y2="150" style="stroke:rgb(0,0,0);stroke-width:3;"/>
  <line x1="250" y1="250" x2="210" y2="350" style="stroke:rgb(0,0,0);stroke-width:3;"/>
  <line x1="250" y1="160" x2="250" y2="340" style="stroke:blue;stroke-width:2;"/>
  <path d="M250 200 A10,10 0 0,0 230,200" stroke="green" style="fill:none;"/>
  <path d="M250 300 A10,10 0 0,1 230,300" stroke="green" style="fill:none;"/>
  <text x="260" y="255" fill="black">Hub</text>
  <text x="170" y="145" fill="black">Chord</text>
  <text x="240" y="155" fill="blue">Tangent Plane</text>
  <text x="255" y="200" fill="green">Angle to Tangent Plane</text>
</svg>
<svg width="500" height="500">
  <circle cx="250" cy="250" r="5" stroke="black" fill="black"/>
  <line x1="250" y1="250" x2="375" y2="250" style="stroke:rgb(0,0,0);stroke-width:3;"/>
  <line x1="250" y1="250" x2="289" y2="369" style="stroke:rgb(0,0,0);stroke-width:3;"/>
  <line x1="250" y1="250" x2="149" y2="323" style="stroke:rgb(0,0,0);stroke-width:3;"/>
  <line x1="250" y1="250" x2="149" y2="177" style="stroke:rgb(0,0,0);stroke-width:3;"/>
  <line x1="250" y1="250" x2="289" y2="131" style="stroke:rgb(0,0,0);stroke-width:3;"/>
  <path d="M263,210 A32,32 0 0,0 216,274" fill="none" stroke="blue" stroke-width="5" />
  <text x="260" y="267" fill="black">Hub</text>
  <text x="110" y="250" fill="blue">Spoke Angle</text>
  <text x="250" y="120" fill="black">Reference Chord/Spoke</text>
  <text x="110" y="340" fill="black">Chord/Spoke</text>
</svg>
Posted in engineering | Tagged , , , , , , , , | Leave a comment