Posted on June 4, 2012
Solving problems with two dimensional indexes
I recently attended a one day Couchbase conference in Seattle. It began with a brief overview of NO -SQL databases and a demo of Couchbase, followed by some in depth programming crash courses on using how to use it in your code. The final presentations were given by real world Couchbase users. One talk in particular, “Scaling Geodata with MapReduce”, given by Nathan Vander Wilt, was quite interesting.
GeoCouch is a service embedded into CouchBase that stores geodata (lat/long coordinates) using R-trees, that allows for highly optimized bounding box queries (find all the locations within this rectangle). GeoCouch does not support polygon queries yet, but it will provide radius support in the future. Nathan gave some examples on how to use GeoCouch to solve common questions (where have I been? What photos have I taken in this area?). The most enlightening part about Nathan’s talk was when he began talking about using GeoCouch in more unconventional ways.
As Pike states in Notes on C Programming, “Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.”
GeoCouch essentially provides a 2-dimensional index, so many problems that can project their data in two dimensions* can suddenly be solved in creative ways. Take for instance the question, “When have I flown?” If you store a data set of altitude and time, with altitude on the Y axis and time on the X axis, the data would look like the graph below. We can safely assume that all points with a y-coordinate greater than say … 10,000 ft represent times you’ve flown on a plane. We then simply draw a bounding box that captures all these points, and we can now use GeoCouch to quickly answer the question.
Another example that Nathan gave was a storing a photograph’s camera model (we can define an enum to map each model to a numerical value) and rating (1-5 stars). By then drawing a bounding box around say, all Canon photos with a rating lower than 2, we can quickly locate all our bad photos.
While I found the entire Couchbase Conference to be incredibly educational and helpful, I especially enjoyed Nathan’s talk because it contained a creative and novel problem solving approach that is not specific to just Couchbase and GeoCouch.
*Projection is a mathematical term. The simplest example is a map: A topographical map is a 2 dimensional projection of a 3 dimensional space.