Sunday, November 11, 2012

The Map

My part of a school project was about the cost of living in the United States. It was a simple project (like compare two cities) where I only had to write a few sentences. I wanted a map too. I was sure somewhere on the whole of the Internet someone had made a United States cost of living map. There were some, most of them were broken down by state. I wanted something more granular (for the record I found one after I went through all this trouble--was similar to mine except it's by county).

I started with military BAH data because I found too many ways to break down cost of living. This simplified the process, moreover it's a real world application. People in the military really get a housing allowance if you live off base (I know from experience), and the amount given is different based on where you are. The BAH data is also available in CVS format, which makes it easy to work with.

Obviously I chose zipcode over country because the BAH data was already broken down by zipcode; also the zipcode boundary data is available on the census website. There are plenty of places that will tell you how to retrieve this and insert it into a KML (which is just an XML file Google Earth uses to store user created information) file. I found this guy's blog post to be the most useful of those search results. He doesn't tell you how to do it, he just has the KML file available. Props to Fil.

The BAH data required a little beating (the term "massaging data" irks me) to be useful. Eventually, I had a text file full of zipcodes and a number for how much BAH received relative to one another, and a KML file full of zipcode shaped polygons. That's where the real work started.

There are a lot of details I'm going to skip here. Most of them would just point out how dumb I am. Let's just say there was a fair bit of trial and error. In the end I used an over complicated while loop (I'm new to C++ but it's fun). It went through the 170 some-odd MB KML file line by line and wrote it to a new file. When it found a zipcode that matched the BAH data it would add a style to that polygon that was a shade of red based on the cost of living number I associated with it. Took about an hour to run, but in the end I am pleased with the results.

I've posted the code on github. No doubt the program could be better, but for a first year CS student it's not horrible. Who knows it could be useful in other school projects. All you need is a list of zipcodes and some other random bit of data separated by a tab.


If this turns out to be helpful to anyone drop me a comment; I'd love to hear about it.

1 comment:

Anonymous said...

Thank you for your mashup of BAH and Zipcodes. I am using to compare with population density for some demographic research.