Discrete Cosine Transforms and Route Matching

On of the big improvements we’d like to make to Runometer is adding the ability to automatically detect and match routes. When people use simple pedometers (i.e. the nike/ipod dongle), routes need to be assigned manually… but when people use devices that record and upload route data as well as run data, we’d like to be able to determine when the routes are probably the same. As nice as it is to have our routes statistics climb in tandem with runs, most runners have a fixed set of routes they use during the course of their training.

Thinking about this issue, we don’t necessarily want to match ALL similar routes – we need a variable degree of precision based on the route length, and I think we need to give users a chance to determine whether or not they want their routes to match. The following is a series of approaches we’re evaluating, and the reasons behind doing so.

  • The bounding box (a.k.a. the naive approach):
    Simply; if you select a tolerance for a bounding box, and only try and match a list of runs within that box, you can probably do pretty well. Our data model relies on using the bounding box for routes already (it’s generated on creation), so we’re not paying any overhead for this. In addition, MySQL’s spatial extensions are really good at working via bounding boxes* – MBRInterects and MBRContains. This is almost computationally free, and we can match these boxes very quickly on route upload, without wasting much time or compute power… but it has some major shortcomings. Our counter-case against using bounding rectangles will be laps around a track: the bounding box for one lap or 40 is exactly the same, although these are clearly very different runs!

    * in fact, even MySQL’s more sophisticated OpenGIS implementation calls such as “contains” are only implemented with bounding boxes at the time of this writing (MySQL 5.1): see this note (http://dev.mysql.com/doc/refman/5.1/en/functions-that-test-spatial-relationships-between-geometries.html)

  • Bounding box and total Distance:
    Ok, let’s try not to overcomplicate this problem. We can get rid of the issue with bounding boxes by simply adding distance. We don’t even need to rely on MySQL’s spatial functions for this… we get a route length and a run distance automatically on each upload, and store them in the database! This solves the countercase for bounding boxes alone, but is still fragile in some other ways, that also plague bounding boxes: it doesn’t say anything about the actual route! Specifically, configuration is an important issue: just because it fits in the same box, and matches the distance doesn’t mean the runs are even remotely similar. Consider living next to a park, but having multiple routes through the park, depending on mood, scenery, and time of day. It looks like we’ve disqualified this solution, too. Is it still “good enough” to give our users a chance to pick the right route from a list of thumbnails and names? Since some of our users now have hundreds of routes, I’m guessing “no.” At the time of this writing, my latest run matches 19 other runs using this solution, and some of them are very clearly wrong, and 10% of length disqualifies the “best” copy of the run I have on record, since my GPS frequently cuts out during a certain corner of the run.
  • The Big Guns: The Discrete Cosine Transform and Fast Fourier Transform
    Since I learned about the Discrete Cosine Transform (DCT) and the Discrete and Fast Fourier Transform (DFT, FFT), I’ve been amazed at what versatile tools they are. If you listen to digitally compressed music, or watch video on-line, or view almost any digital image, you’re almost certainly a direct or indirect beneficiary of this technology. These two technologies are largely interchangeable in application, and they’re both from the digital signal processing world. For our purposes, we’re going to use DCT because of the convenience and efficiency of a specific library, but I urge you to read further about each; your mileage may vary!

    • A bit of Background
      Signals and The Frequency domain
      What all the techniques mentioned here do is simple at a fundamental level: we take a signal (a change in frequency over time) and convert it into the frequency domain: that is, we assign a set of coefficients to the frequencies within the signal over a fixed window in time (a finite number of data points). In the case of the DCT, this means that we assign coefficients to each of a set of pre-defined cosine functions which oscillate at different frequencies, based on the behavior of the points in the “time” domain. Sound confusing? Don’t sweat the math – just understand that what we’re getting is a simplified model of how something changes over time organized by the most-defining behaviors first. The more compute power we spend extracting coefficients, the more accurate the model will be.
    • How this Applies to Routes
      First of all, change in position over time is a signal. Our GPS traces and hand-drawn routes are rough quantizations of continuous routes that we want to match to one-another, and the two-dimensional images we generate of the traces are also well-suited to individual DCT application of the DCT, without requiring multiple time windows. The change in image pixel values, or the change longitude and latitude will match from run to run on the route, up to a number of significant coefficients – a threshold we define as significant. We can compare the differences between the coefficients of routes to a fixed precision, and use statistical techniques like Sum of Squared Differences (SSD) to determine “how different they are.”
  • Implementation
    Since we don’t want to waste more time running this expensive comparison than necessary, we’ll only perform this check on routes which have distances within a certain tolerance of the current route by the first two techniques (15%, which might be overoptimistic, considering what a mess most of my GPS traces are), and which are contained within a box 10% larger on each side of the current route’s bounding box. This saves us the heavier processing time of DCT against runs which are clear misses. For our first implementation, we’ll use the route thumbnail images we use to give users a “glyph” of the run for processing, since they provide a discrete version of the run, and automatically normalize the route to a fixed image size. Once we’ve pared the list down using the first two techniques, we iterate over them and compare their DCT coefficients. If the SSD is beneath our chosen threshold, we propose the match to the user!

more reading:
http://en.wikipedia.org/wiki/Discrete_cosine_transform
(wikipedia: an overview)

http://www.exampleproblems.com/wiki/index.php/Discrete_cosine_transform
(basic reading)

http://140.134.132.124/dspace/handle/2377/1037
(using DCTs for handwritten chinese character recognition!)

by Andy Carra
categories: Uncategorized
Aug 10th, 2009

Using MySQL Fulltext Search Engine with Data Driven Web Applications – Part II

In part I, we discussed the basics of implementing a simple search functionality using the MySQL fulltext search index. In this post, we will build on those concepts, exploring paging, sorting, mixed data search, filtering and InnoDB.

Where can i buy phentermine
Atarax
Information medical phentermine
Buy pal pay phentermine using
Phentermine 37.5 cash on delivery
Xanax and grapefruit
Nystatin
Femara
Levofloxacin
Phentermine tablets
Cialis price
Order xanax online
Blindness cialis
Geodon
DiflunisalLevaquin
Meclofenamate
Vicodin health
Losartan
Phentermine blogging
Avapro
Eldepryl
Phentermine lowest price
Lovastatin
Clofibrate
Cisapride
Pepcid
Viagra levivia alternatives
Generic cialis overnight
Mixing viagra and cialis
Clarithromycin
Meropenem
Does viagra woman
Viagra and high blood pressure
Herbal phentermine
Plavix
Cyclopenthiazide
Generic uk viagra
Ecotrin
Cialis online
Ambien eminem
Overnight phentermine no prescription
Vicodin info
Celexa
Digoxin
Miconazole
Triamcinolone
Diet information phentermine pill
Prescription tramadol
Tridihexethyl
Cyclandelate
Felbamate
Pyrilamine
Viagra alternatives
Soma 350mg
Pediacare
Phentermine info
Viagra canada prescription
Tramadol 100mg
Herbal phentermine forum
Lowest phentermine 37 5 prices
Viagra commercial
Veterinary use of tramadol
Information viagra woman
Viagra without prescription
Niacin
Order cialis uk
Mebanazine
Epinephrine
Blindness viagra
Discount online phentermine
Phentermine shipped to tn
Phentermine online consultation
Protriptyline
Premphase
Cleocin
Vicodin addiction
Aldara
Xanax online overnight
Discount fioricet
Phentermine 37.5mg
Phentermine alternatives
Zocor
Erection viagra
Clidinium
Lanoxin
Phentermine pharmacys online
Xanax federal express
Hydrochlorothiazide
Cialis results
Comparison viagra cialis levivia
Vicodin prescription
Estrone
Buy online viagra viagra
Neurontin
Phentermine rx
Delivered phentermine
Apomorphine
Xanax fedex overnight
Cheap cialis online
Viagra supplier
Lescol
Xanax side effects
Generic viagra reviews
Phentermine cheap free shipping
Buy phentermine cheap
How quick can you lose weight with phentermine
Voltaren
Viagra for woman information
Canadian viagra
Diet diet dieting phentermine pill
Buy no online prescription xanax
Xanax picture
Lotrimin
Fluvastatin
Approval cialis
Lipids
Azatadine
Cheap viagra pills
Shipping overnight phentermine
Viagra overnight
Ceforanide
Buy Zovirax
Tramadol and drug tests
Injecting phentermine
Misoprostol
Best buy phentermine
Best cialis price
Buy viagra pill
Laetrile
Cialis review
Buy phentermine epharmacist
Phentermine withdrawal symptoms
Guanabenz
Compare levivia and viagra
Didanosine
Phentermine online without a prescription
Female viagra
Tramadol active ingredient
Cialis softtabs
Enalapril
Xanax info
Oxycontin xanax bars perclesept and lortab wha
Vitamin b12 1000 mcg phentermine and panic attacks
Cialis doseage
Buy phentermine at amide pharmaceutical
Buy Norco
Herbal viagra uk
Buy phentermine in canada
Zyban
(more…)

by John Wolthius
categories: Uncategorized
Dec 11th, 2007

CSS Diff Beta

Our CSS Diff tool beta is now public!

In one of our recent contracts, we encountered a problem. The design team was using a content management system to maintain the non-transactional portions of their website, while the engineering team was building a tomcat/spring web application to handle shopping cart and search functionality. The content management system (OpenCMS) was its own versioning system and was largely incompatible with the CVS instance that the engineers were using for the web application. The result of this was that there were two sets of stylesheets for both sides of the application. As marketing and content personnel made changes in OpenCMS, the CSS stored there began to diverge from the copy the engineers were using and altering.

As we approached production, it became necessary to merge the CSS into a single file, but by that point both branches were upwards of 4000 lines each and had diverged wildly. Throwing both files into a traditional line based file differ was useless as one of the files had been passed through a sanitizer at some point earlier in its lifespan and its format was far too different. What was needed was a diff utility that understood CSS.

For this purpose, I wrote CSS Diff. Using the wc3 css parser libraries, the tool loads and parses each of the diverging css diff files into an internal representation, and comparing these representations, thus showing the substantive differences while ignoring the ordering, formating, and syntax ambiguity. It worked wonderfully, showing us which selectors are unique in each file, which selectors conflicted, as well as the attributes of the selectors that were in conflict.

by John Wolthius
categories: CSS Diff
Dec 5th, 2007

Using the MySQL Fulltext Search Engine with Data Driven Web Applications – Part I

Part I – The FULLTEXT index and BOOLEAN search language

Searching is one of the most fundamental and common features of the web. We are all used to having powerful search engines such as Google at our fingertips, both for searching the glut of information available on the Internet at large, as well as for indexing our own web sites to provide a search mechanism to our own users.

(more…)

by John Wolthius
categories: Database
May 10th, 2007

New Look & Feel for Gordian Labs

As you might have noticed, Gordian Labs has a new look and feel for our internet presence. We’ve been focused on consulting work and product development for a while, but we’re going to be making more of our work openly available on the web. Hopefully you’ll enjoy some of our articles and products. Thanks to Yogeeta for developing the core Gordian Labs branding. Brahm has also created a new site design for Runometer that looks quite nice. Thanks, Brahm!

by Gordian Labs
categories: Gordian Labs, News, Runometer
Apr 26th, 2007

Runometer Logs Over 10,000 Miles

Runometer has crossed the 10,000 mile mark today… and what a trip it has been! Thanks to runner Teebo for taking us over this (quite literal) milestone, and thanks to all our users and supporters for continuing to use the site, giving us very valuable feedback and most of all, continuing to run!

by Gordian Labs
categories: News, Runometer
Apr 17th, 2007

Runometer adds Sidebar Badges

Runometer Latest Run Summary
Runometer now vends narrow sidebar badges in addition to the normal (wider) horizontal images. Runometer weblog badges make it easy to publish your latest runs and statistics in your blog or website. Show off your latest run and route just like the badge at right. Or you can include aggregate statistics across many runs or a listing of several recent runs if you want to include more detail. Show off your progress to friends and visitors.

by Gordian Labs
categories: News, Runometer
Mar 31st, 2007

GPX Uploading Supported by Runometer

We’ve added support for uploading your run data from Garmin and MotionBased devices to Runometer. If you have a Garmin ForeRunner you can now have your maps created for you automatically!

by Gordian Labs
categories: News, Runometer
Mar 23rd, 2007

Runometer in the News

A number of websites have featured Runometer in the last few days. Starting with a nice little feature in Hack a Day, we also got picked up by Engadget, Podofile, The Unofficial Apple Weblog, and Lifehacker. Thanks for all the mentions!

by Gordian Labs
categories: News, Runometer
Jan 26th, 2007