pythonregextextmatching

Unstructured Text/Number merge


I am trying to match fields in 2 separate datasets. They are both address fields. One data set may contain something like "532 Sheffield Dr" and the other may contain only "Sheffield Dr". Another example is "US21 Ramp and Hays RD" with "US 21", "N 25th St and Danville RD" with "25th St" and so on. So basically, all the text/numbers in the column in the second dataset should match with that of the first dataset even though the data in the first dataset might contain some extra text/numbers. I have been trying to use RegEx but haven't been able to figure out the appropriate code for it. How do I go about this?


Solution

  • Based on your examples and what I understood the easiest way is something like:

    s1 = ["532 Sheffield Dr",  "US21 Ramp and Hays RD",  "N 25th St and Danville RD"]
    s2 = ["Sheffield Dr",  "US 21", "25th St"]
    
    for item2 in s2:
        for item1 in s1:
            if item2 in item1 or item2.replace(' ', '') in item1:
                print('%s in %s' % (item2, item1))