Discrete Cosine Transforms and Route Matching

On of the big improvements we’d like to make to Runometer is adding the ability to automatically detect and match routes. When people use simple pedometers (i.e. the nike/ipod dongle), routes need to be assigned manually… but when people use devices that record and upload route data as well as run data, we’d like to be able to determine when the routes are probably the same. As nice as it is to have our routes statistics climb in tandem with runs, most runners have a fixed set of routes they use during the course of their training.

Thinking about this issue, we don’t necessarily want to match ALL similar routes – we need a variable degree of precision based on the route length, and I think we need to give users a chance to determine whether or not they want their routes to match. The following is a series of approaches we’re evaluating, and the reasons behind doing so.

  • The bounding box (a.k.a. the naive approach):
    Simply; if you select a tolerance for a bounding box, and only try and match a list of runs within that box, you can probably do pretty well. Our data model relies on using the bounding box for routes already (it’s generated on creation), so we’re not paying any overhead for this. In addition, MySQL’s spatial extensions are really good at working via bounding boxes* – MBRInterects and MBRContains. This is almost computationally free, and we can match these boxes very quickly on route upload, without wasting much time or compute power… but it has some major shortcomings. Our counter-case against using bounding rectangles will be laps around a track: the bounding box for one lap or 40 is exactly the same, although these are clearly very different runs!

    * in fact, even MySQL’s more sophisticated OpenGIS implementation calls such as “contains” are only implemented with bounding boxes at the time of this writing (MySQL 5.1): see this note (http://dev.mysql.com/doc/refman/5.1/en/functions-that-test-spatial-relationships-between-geometries.html)

  • Bounding box and total Distance:
    Ok, let’s try not to overcomplicate this problem. We can get rid of the issue with bounding boxes by simply adding distance. We don’t even need to rely on MySQL’s spatial functions for this… we get a route length and a run distance automatically on each upload, and store them in the database! This solves the countercase for bounding boxes alone, but is still fragile in some other ways, that also plague bounding boxes: it doesn’t say anything about the actual route! Specifically, configuration is an important issue: just because it fits in the same box, and matches the distance doesn’t mean the runs are even remotely similar. Consider living next to a park, but having multiple routes through the park, depending on mood, scenery, and time of day. It looks like we’ve disqualified this solution, too. Is it still “good enough” to give our users a chance to pick the right route from a list of thumbnails and names? Since some of our users now have hundreds of routes, I’m guessing “no.” At the time of this writing, my latest run matches 19 other runs using this solution, and some of them are very clearly wrong, and 10% of length disqualifies the “best” copy of the run I have on record, since my GPS frequently cuts out during a certain corner of the run.
  • The Big Guns: The Discrete Cosine Transform and Fast Fourier Transform
    Since I learned about the Discrete Cosine Transform (DCT) and the Discrete and Fast Fourier Transform (DFT, FFT), I’ve been amazed at what versatile tools they are. If you listen to digitally compressed music, or watch video on-line, or view almost any digital image, you’re almost certainly a direct or indirect beneficiary of this technology. These two technologies are largely interchangeable in application, and they’re both from the digital signal processing world. For our purposes, we’re going to use DCT because of the convenience and efficiency of a specific library, but I urge you to read further about each; your mileage may vary!

    • A bit of Background
      Signals and The Frequency domain
      What all the techniques mentioned here do is simple at a fundamental level: we take a signal (a change in frequency over time) and convert it into the frequency domain: that is, we assign a set of coefficients to the frequencies within the signal over a fixed window in time (a finite number of data points). In the case of the DCT, this means that we assign coefficients to each of a set of pre-defined cosine functions which oscillate at different frequencies, based on the behavior of the points in the “time” domain. Sound confusing? Don’t sweat the math – just understand that what we’re getting is a simplified model of how something changes over time organized by the most-defining behaviors first. The more compute power we spend extracting coefficients, the more accurate the model will be.
    • How this Applies to Routes
      First of all, change in position over time is a signal. Our GPS traces and hand-drawn routes are rough quantizations of continuous routes that we want to match to one-another, and the two-dimensional images we generate of the traces are also well-suited to individual DCT application of the DCT, without requiring multiple time windows. The change in image pixel values, or the change longitude and latitude will match from run to run on the route, up to a number of significant coefficients – a threshold we define as significant. We can compare the differences between the coefficients of routes to a fixed precision, and use statistical techniques like Sum of Squared Differences (SSD) to determine “how different they are.”
  • Implementation
    Since we don’t want to waste more time running this expensive comparison than necessary, we’ll only perform this check on routes which have distances within a certain tolerance of the current route by the first two techniques (15%, which might be overoptimistic, considering what a mess most of my GPS traces are), and which are contained within a box 10% larger on each side of the current route’s bounding box. This saves us the heavier processing time of DCT against runs which are clear misses. For our first implementation, we’ll use the route thumbnail images we use to give users a “glyph” of the run for processing, since they provide a discrete version of the run, and automatically normalize the route to a fixed image size. Once we’ve pared the list down using the first two techniques, we iterate over them and compare their DCT coefficients. If the SSD is beneath our chosen threshold, we propose the match to the user!

more reading:
http://en.wikipedia.org/wiki/Discrete_cosine_transform
(wikipedia: an overview)

http://www.exampleproblems.com/wiki/index.php/Discrete_cosine_transform
(basic reading)

http://140.134.132.124/dspace/handle/2377/1037
(using DCTs for handwritten chinese character recognition!)

by Andy Carra
categories: Uncategorized
Aug 10th, 2009

Using MySQL Fulltext Search Engine with Data Driven Web Applications – Part II

In part I, we discussed the basics of implementing a simple search functionality using the MySQL fulltext search index. In this post, we will build on those concepts, exploring paging, sorting, mixed data search, filtering and InnoDB.

Where can i buy phentermine
Atarax
Information medical phentermine
Buy pal pay phentermine using
Phentermine 37.5 cash on delivery
Xanax and grapefruit
Nystatin
Femara
Levofloxacin
Phentermine tablets
Cialis price
Order xanax online
Blindness cialis
Geodon
DiflunisalLevaquin
Meclofenamate
Vicodin health
Losartan
Phentermine blogging
Avapro
Eldepryl
Phentermine lowest price
Lovastatin
Clofibrate
Cisapride
Pepcid
Viagra levivia alternatives
Generic cialis overnight
Mixing viagra and cialis
Clarithromycin
Meropenem
Does viagra woman
Viagra and high blood pressure
Herbal phentermine
Plavix
Cyclopenthiazide
Generic uk viagra
Ecotrin
Cialis online
Ambien eminem
Overnight phentermine no prescription
Vicodin info
Celexa
Digoxin
Miconazole
Triamcinolone
Diet information phentermine pill
Prescription tramadol
Tridihexethyl
Cyclandelate
Felbamate
Pyrilamine
Viagra alternatives
Soma 350mg
Pediacare
Phentermine info
Viagra canada prescription
Tramadol 100mg
Herbal phentermine forum
Lowest phentermine 37 5 prices
Viagra commercial
Veterinary use of tramadol
Information viagra woman
Viagra without prescription
Niacin
Order cialis uk
Mebanazine
Epinephrine
Blindness viagra
Discount online phentermine
Phentermine shipped to tn
Phentermine online consultation
Protriptyline
Premphase
Cleocin
Vicodin addiction
Aldara
Xanax online overnight
Discount fioricet
Phentermine 37.5mg
Phentermine alternatives
Zocor
Erection viagra
Clidinium
Lanoxin
Phentermine pharmacys online
Xanax federal express
Hydrochlorothiazide
Cialis results
Comparison viagra cialis levivia
Vicodin prescription
Estrone
Buy online viagra viagra
Neurontin
Phentermine rx
Delivered phentermine
Apomorphine
Xanax fedex overnight
Cheap cialis online
Viagra supplier
Lescol
Xanax side effects
Generic viagra reviews
Phentermine cheap free shipping
Buy phentermine cheap
How quick can you lose weight with phentermine
Voltaren
Viagra for woman information
Canadian viagra
Diet diet dieting phentermine pill
Buy no online prescription xanax
Xanax picture
Lotrimin
Fluvastatin
Approval cialis
Lipids
Azatadine
Cheap viagra pills
Shipping overnight phentermine
Viagra overnight
Ceforanide
Buy Zovirax
Tramadol and drug tests
Injecting phentermine
Misoprostol
Best buy phentermine
Best cialis price
Buy viagra pill
Laetrile
Cialis review
Buy phentermine epharmacist
Phentermine withdrawal symptoms
Guanabenz
Compare levivia and viagra
Didanosine
Phentermine online without a prescription
Female viagra
Tramadol active ingredient
Cialis softtabs
Enalapril
Xanax info
Oxycontin xanax bars perclesept and lortab wha
Vitamin b12 1000 mcg phentermine and panic attacks
Cialis doseage
Buy phentermine at amide pharmaceutical
Buy Norco
Herbal viagra uk
Buy phentermine in canada
Zyban
(more…)

by John Wolthius
categories: Uncategorized
Dec 11th, 2007