Thursday, March 19, 2009

Getting Intel Lab Data into Postgres (and getting it out again)

Intel Lab has made a trace of sensor readings in a wireless sensor network available in 2004. It contains readings from 54 sensor nodes in the Intel Berkeley Data Lab between February 28th and April 5th, 2004. This data set turns out very useful for validating your own great ideas using Matlab and the like, before you bother to implement stuff on actual sensor hardware.

However, at about 2.3 million rows, the data set is rather bulky to handle, and most certainly you don't want to load it into Matlab directly. Here I will describe how you can load the data set into Postgres. Once it's in there, you can easily select parts of the data set -- e.g. all voltage readings on a specific day -- into a file. These small files are much easier to handle ...

I assume that you already have installed Postgres and have some familiarity with it. Now, we will need to create a table to hold the data. Here goes:

CREATE TABLE intellab (
date DATE NOT NULL,
time TIME WITHOUT TIME ZONE NOT NULL,
epoch INTEGER NOT NULL,
moteid INTEGER,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION,
light DOUBLE PRECISION,
voltage DOUBLE PRECISION
);

You might wonder why there is not a NOT NULL constraint on every column. This is because some of the entries in the data set are truncated. You might wonder even more that there is no primary key. This is because for some entries, moteid is NULL. A primary key would arguably have to contain this attribute, so I opted for no primary key at all. (I know this is bad ^^)

Next, before we can load the data set into our table, we need to massage the logs a bit. Download the data set and decompress it with gzip. Then, we need to replace every space in the log file by a comma. sed to the rescue! In a terminal, run:

$ sed -i 's/ /,/g' data.txt

This may take a while, but we are almost there. Now let's get the data from the data.txt into our table. In a postgres terminal, run

COPY intellab FROM '/tmp/data.txt' WITH DELIMITER ',' NULL AS '';

where '/tmp/data.txt' is the path to your modified data.txt. Copying the data into the table took about one minute on my machine.

Once you have the data set in Postgres, you will want to get an interesting subset out again ;) For example, let's assume we were interested in the temperature readings of motes 1 and 2 on Feb. 28th. We can use the COPY command for this as well. In a postgres terminal, run:

COPY (SELECT moteid, time, epoch, temperature
FROM intellab
WHERE moteid IN (1,2) AND date = '2004-02-28')
TO '/tmp/temp_1_2_02-28.txt';

We can now take the output file and load it into our math program or simulator of choice. BTW, you can further control the output of COPY ... TO. See the COPY manual page.


Kudos to the Intel Lab for making the data set available to the public!