The concepts
Every time I try to do something with postcodes I find myself trying to remember the different potential formats and googling for effective regular expressions (the one I want is never in the first five I try).
I decided to record what I found to help out future me (and others). This post will:
- Point you to a much better article Programmer’s guide to UK postcodes
- Give a simple Python example for using regular expressions to extract parts of a UK postcode
Firstly, this page is brill from the GetTheData peeps:
Programmer’s guide to UK postcodes
It provides some really useful context (including a clear graphic describing how UK postcodes are constructed) and has some sample regular expressions This basically unblocked me last time I went round this loop.
Getting the different parts of a postcode
I used the regex from getthedata.com to easily extract the Outcode from a full postcode string.
import re source_string = "wc2b 3dx" string_to_process = source_string.replace(" ","").upper() # WC2B3DX matches = re.findall(r'^((([A-Z][A-Z]{0,1})([0-9][A-Z0-9]{0,2})) {0,}(([0-9])([A-Z]{2})))', postcode) # [('WC2B 3DX', 'WC2B', 'WC', '2B', '3DX', '3', 'DX')]
The re.findall
method will output a list by default (list of matches). As the above regex only matches on the first postcode in a string, and I know I’m only ever passing a single postcode in, I take the first match from the list to give me a tuple containing the various postcode parts.
postcode_parts = matches[0] # ('WC2B 3DX', 'WC2B', 'WC', '2B', '3DX', '3', 'DX')
I can then use whichever part of the postcode I need. In my example I wanted to first look for a complete match in a big list, and if there’s no match look for a match on just the Outcode.
First I get the full postcode and outcode to work with:
postcode = postcode_parts[0] outcode = postcode_parts[1]
Then I can try and find them in the list:
list_of_postcodes = ['WC2B3DF', 'WC2D5FD', 'WC2B', 'AB34FD', 'AB3'] if postcode in list_of_postcodes: print('Found the full postcode!') elif outcode in list_of_postcodes: print('Found the outcode!') else: print('Didn't find anything :(') # Found the outcode!
Hopefully this might help save others a few minutes by avoiding the Google / Stack Overflow loop!