Discrete Cosine Transforms and Route Matching

On of the big improvements we’d like to make to Runometer is adding the ability to automatically detect and match routes. When people use simple pedometers (i.e. the nike/ipod dongle), routes need to be assigned manually… but when people use devices that record and upload route data as well as run data, we’d like to be able to determine when the routes are probably the same. As nice as it is to have our routes statistics climb in tandem with runs, most runners have a fixed set of routes they use during the course of their training.

Thinking about this issue, we don’t necessarily want to match ALL similar routes – we need a variable degree of precision based on the route length, and I think we need to give users a chance to determine whether or not they want their routes to match. The following is a series of approaches we’re evaluating, and the reasons behind doing so.

  • The bounding box (a.k.a. the naive approach):
    Simply; if you select a tolerance for a bounding box, and only try and match a list of runs within that box, you can probably do pretty well. Our data model relies on using the bounding box for routes already (it’s generated on creation), so we’re not paying any overhead for this. In addition, MySQL’s spatial extensions are really good at working via bounding boxes* – MBRInterects and MBRContains. This is almost computationally free, and we can match these boxes very quickly on route upload, without wasting much time or compute power… but it has some major shortcomings. Our counter-case against using bounding rectangles will be laps around a track: the bounding box for one lap or 40 is exactly the same, although these are clearly very different runs!

    * in fact, even MySQL’s more sophisticated OpenGIS implementation calls such as “contains” are only implemented with bounding boxes at the time of this writing (MySQL 5.1): see this note (http://dev.mysql.com/doc/refman/5.1/en/functions-that-test-spatial-relationships-between-geometries.html)

  • Bounding box and total Distance:
    Ok, let’s try not to overcomplicate this problem. We can get rid of the issue with bounding boxes by simply adding distance. We don’t even need to rely on MySQL’s spatial functions for this… we get a route length and a run distance automatically on each upload, and store them in the database! This solves the countercase for bounding boxes alone, but is still fragile in some other ways, that also plague bounding boxes: it doesn’t say anything about the actual route! Specifically, configuration is an important issue: just because it fits in the same box, and matches the distance doesn’t mean the runs are even remotely similar. Consider living next to a park, but having multiple routes through the park, depending on mood, scenery, and time of day. It looks like we’ve disqualified this solution, too. Is it still “good enough” to give our users a chance to pick the right route from a list of thumbnails and names? Since some of our users now have hundreds of routes, I’m guessing “no.” At the time of this writing, my latest run matches 19 other runs using this solution, and some of them are very clearly wrong, and 10% of length disqualifies the “best” copy of the run I have on record, since my GPS frequently cuts out during a certain corner of the run.
  • The Big Guns: The Discrete Cosine Transform and Fast Fourier Transform
    Since I learned about the Discrete Cosine Transform (DCT) and the Discrete and Fast Fourier Transform (DFT, FFT), I’ve been amazed at what versatile tools they are. If you listen to digitally compressed music, or watch video on-line, or view almost any digital image, you’re almost certainly a direct or indirect beneficiary of this technology. These two technologies are largely interchangeable in application, and they’re both from the digital signal processing world. For our purposes, we’re going to use DCT because of the convenience and efficiency of a specific library, but I urge you to read further about each; your mileage may vary!

    • A bit of Background
      Signals and The Frequency domain
      What all the techniques mentioned here do is simple at a fundamental level: we take a signal (a change in frequency over time) and convert it into the frequency domain: that is, we assign a set of coefficients to the frequencies within the signal over a fixed window in time (a finite number of data points). In the case of the DCT, this means that we assign coefficients to each of a set of pre-defined cosine functions which oscillate at different frequencies, based on the behavior of the points in the “time” domain. Sound confusing? Don’t sweat the math – just understand that what we’re getting is a simplified model of how something changes over time organized by the most-defining behaviors first. The more compute power we spend extracting coefficients, the more accurate the model will be.
    • How this Applies to Routes
      First of all, change in position over time is a signal. Our GPS traces and hand-drawn routes are rough quantizations of continuous routes that we want to match to one-another, and the two-dimensional images we generate of the traces are also well-suited to individual DCT application of the DCT, without requiring multiple time windows. The change in image pixel values, or the change longitude and latitude will match from run to run on the route, up to a number of significant coefficients – a threshold we define as significant. We can compare the differences between the coefficients of routes to a fixed precision, and use statistical techniques like Sum of Squared Differences (SSD) to determine “how different they are.”
  • Implementation
    Since we don’t want to waste more time running this expensive comparison than necessary, we’ll only perform this check on routes which have distances within a certain tolerance of the current route by the first two techniques (15%, which might be overoptimistic, considering what a mess most of my GPS traces are), and which are contained within a box 10% larger on each side of the current route’s bounding box. This saves us the heavier processing time of DCT against runs which are clear misses. For our first implementation, we’ll use the route thumbnail images we use to give users a “glyph” of the run for processing, since they provide a discrete version of the run, and automatically normalize the route to a fixed image size. Once we’ve pared the list down using the first two techniques, we iterate over them and compare their DCT coefficients. If the SSD is beneath our chosen threshold, we propose the match to the user!

more reading:
http://en.wikipedia.org/wiki/Discrete_cosine_transform
(wikipedia: an overview)

http://www.exampleproblems.com/wiki/index.php/Discrete_cosine_transform
(basic reading)

http://140.134.132.124/dspace/handle/2377/1037
(using DCTs for handwritten chinese character recognition!)

by Andy Carra
categories: Uncategorized
Aug 10th, 2009