phpsqlmysqlgeolocationpostal-code

MySQL query to match UK postcodes regardless of the number of spaces


I have the simplest table in the world, for looking up lat/lng values for a UK postcode (loaded with full UK postcode data):

CREATE TABLE postcodes (
  postcode char(7) NOT NULL,
  lat double(10,6) NOT NULL,
  lng double(10,6) NOT NULL,
  KEY postcode (postcode)
)

Postcodes in 'postcode' field either have 2 digits at the end of the first half, or one and then a space. I think the space is important for the integrity of how they are matched (??), and besides I don't want to remove the spaces in the table, as I'm also pulling out the postcodes for display purposes (and I don't want a duplicate field, because I'm fussy!). Examples:

'LE115AF', 'BS6 5EE', 'W1A 1AA', 'BS216RS', 'M3 1NH'

So, some have spaces, some don't. Most are 7 chars overall, some only 6.

Anyway, point being is I want users to be able to enter postcode queries, including partial postcodes, with or without spaces, and always find a match if their input string is valid (i.e. they don't enter a full or partial postcode which doesn't exist in the table).

This is how I've done it so far (with some help from PHP):

{...} WHERE `postcode` LIKE '" . str_replace(' ','%',$query) . "%' LIMIT 1

This is good for:

But doesn't work for these queries:

I'm guessing I need to somehow do some MySQL string function magic to work out if there's a space in the row's postcode field, and adjust my WHERE clause logic accordingly? Anyone got any advice on the best approach? I ideally want to also:


Solution

  • Create a new column which is just the postcode field with the spaces stripped, and create a unique index on it. You shouldn't find any duplicates. That should reassure you that the space really isn't important :)

    Then use that for the lookup, after stripping spaces on your input postcodes.

    Bear in mind that solutions that involve applying string functions to the postcode column of the table may stop MySQL from using any indexes on that column. (The index is based on the exact data in the column, so if you start applying functions to that data, the optimiser will generally decide that the index is useless.)

    If you do feel the need to reformat things, the easiest option is to work from the knowledge that while the "outbound" part of the postcode -- the part before the space -- varies in format a little, the "inbound" part -- the part after the space -- is always a single digit followed by two letters.

    Probably the best resource on the format I've ever found is the Wikipedia entry, by the way.