pythonduplicatespython-dedupe

How do you make a gazetteer for Dedupe when individuals have multiple addresses?


According to the datamade Dedupe documentation, it seems like a gazetteer needs to have clean, distinct individual-level data.

What do you do if the individual has moved, changed jobs, etc a bunch of times? Include multiple observations per individual with the blanks intelligently filled in?


Solution

  • If you know that a person has had multiple addresses than I would create a 'gazetteer' like this.

    Address                Name      Person_ID
    123 Main St.           John Doe  1
    100 High St.           John Doe  1
    1600 Pennsylvania Ave  John Doe  1
    

    When you are match against this, you will have then have a second resolution step where you merge by Person_ID