[SOLVED] Can I get Python to compare a list of nicknames with a list of full names?

Can I get Python to compare a list of nicknames with a list of full names?

So first off I have a character data frame that has a column called name and contains the full name for 100+ people.

Eg, Name: Johnathan Jay Smith, Harold Robert Doe, Katie Holt.

Then I have a list of unique nicknames eg, [Mr. Doe, Aunt Katie, John]

It's important to note that they are not in the same order, and that not everyone with a nickname is in the full name list, and not everyone in the full name list is in the nickname list. I will be removing rows that don't have matching values at the end.

My Question: is there a way I can get python to read through these 2 lists item by item and match John with Johnathan Jay Smith for everyone that has a match? Basically if the nickname appears as a part of the whole name, can I add a nickname column to my existing character data frame without doing this manually for over 100 people?

Thank you in advance, I don't even know where to start with this one!

Solution

This is very straight forward and does not take spelling variants into account

from itertools import product

names = ['Johnathan Jay Smith', 'Harold Robert Doe', 'Katie Holt']
nicknames = ["Mr. Doe", "Aunt Katie", "John"]

def match_nicknames(names, nicknames):
    splitted_names = [n.split(' ') for n in names]
    splitted_nn = [n.split(' ') for n in nicknames]
    matches = []
    for name in splitted_names:
        name_pairs = product(name, splitted_nn)
        matched = filter(lambda x: any([nn in x[0] for nn in x[1]]), name_pairs)
        if matched:
            matches += [(" ".join(name), " ".join(nn)) for name_part, nn in matched]
    return matches

match_nicknames(names, nicknames)
>> [('Johnathan Jay Smith', 'John'),
    ('Harold Robert Doe', 'Mr. Doe'),
    ('Katie Holt', 'Aunt Katie')]