pythonpandasstringmatchexact-match

pandas: exact match does not work in an if AND condition


I have two dataframes as follows:

data = {'First':  [['First', 'value'],['second','value'],['third','value','is'],['fourth','value','is']],
'Second': ['noun','not noun','noun', 'not noun']}

df = pd.DataFrame (data, columns = ['First','Second'])

and

data2 = {'example':  ['First value is important', 'second value is important too','it us good to know',
                  'Firstap is also good', 'aplsecond is very good']}

df2 = pd.DataFrame (data2, columns = ['example'])

and I have written the following code that would filter out the sentences from df2 if there is a match in df for the first word of the sentence, only if in the second column we have a match for the word 'noun'. so basically there are two conditions.

def checker():
    result =[]
    for l in df2.example:
        df['first_unlist'] = [','.join(map(str, l)) for l in df.First]
        if df.first_unlist.str.match(pat=l.split(' ', 1)[0]).any() and df.Second.str.match('noun').any():
            result.append(l)
    return result

however, i realized that i get ['First value is important', 'second value is important too'] as the output when I run the function, which shows that the second condition for 'noun' filter only does not work. so my desired output would be ['First value is important']. I have also tried .str.contains() and .eq() but I still got the same output


Solution

  • I would suggest filtering out df before trying to match:

    def checker():
        result = []
        for l in df2.example:
            first_unlist = [x[0] for x in df.loc[df.Second == 'noun', 'First']
            if l.split(' ')[0] in first_unlist:
                result.append(l)
        return result
    
    checker()
    ['First value is important']