I have two dataframes as follows:
data = {'First': [['First', 'value'],['second','value'],['third','value','is'],['fourth','value','is']],
'Second': ['noun','not noun','noun', 'not noun']}
df = pd.DataFrame (data, columns = ['First','Second'])
and
data2 = {'example': ['First value is important', 'second value is important too','it us good to know',
'Firstap is also good', 'aplsecond is very good']}
df2 = pd.DataFrame (data2, columns = ['example'])
and I have written the following code that would filter out the sentences from df2 if there is a match in df for the first word of the sentence, only if in the second column we have a match for the word 'noun'. so basically there are two conditions.
def checker():
result =[]
for l in df2.example:
df['first_unlist'] = [','.join(map(str, l)) for l in df.First]
if df.first_unlist.str.match(pat=l.split(' ', 1)[0]).any() and df.Second.str.match('noun').any():
result.append(l)
return result
however, i realized that i get ['First value is important', 'second value is important too'] as the output when I run the function, which shows that the second condition for 'noun' filter only does not work. so my desired output would be ['First value is important']. I have also tried .str.contains() and .eq() but I still got the same output
I would suggest filtering out df
before trying to match:
def checker():
result = []
for l in df2.example:
first_unlist = [x[0] for x in df.loc[df.Second == 'noun', 'First']
if l.split(' ')[0] in first_unlist:
result.append(l)
return result
checker()
['First value is important']