import pandas as pd
df = pd.DataFrame({'Environment': [['AppleOS X','postgres','Apache','tomcat']], 'Description': [['Apache', 'Commons', 'Base32', 'decoding', 'invalid', 'rejecting', '.', 'via','valid', '.']] })
Environment Description
0 [AppleOS X, postgres, Apache, tomcat] [Apache, Commons, Base32, decoding, invalid, rejecting, ., via, valid, .]
I am new to Pandas and dataframes, and I have to doubt in finding the intersection of two columns mentioned above.
Objective:
Environment and Description are two columns in a dataframe. The objective is to create a new column with the intersection of strings present in the first two columns.
Existing Implementation:
def f(param):
return set.intersection(set(param['Environment']),set(param['Description']))
df['unique_words'] = df.apply(f, axis=1)
print(df['unique_words'])
This set intersection syntax is something I referred in https://www.kite.com/python/answers/how-to-find-the-intersection-of-two-lists-in-python
Problem:
I am not sure how the above syntax works, but it returns with {}
Expected Output:
As ['Apache'] is present in both the columns, it should be the value in the new column created in the dataframe.
Kindly let me know if anyone had done a similar function or any help is appreciated.
set.intersection
map
lowercase to the values in the list# assumes only the two columns in the dataframe
df['common_words'] = df.apply(lambda x: list(set(map(str.lower, x[0])).intersection(map(str.lower, x[1]))), axis=1)
# if there are many columns, specify the two desired columns to compare
df['common_words'] = df[['Environment', 'Description']].apply(lambda x: list(set(map(str.lower, x[0])).intersection(map(str.lower, x[1]))), axis=1)
# display(df)
Environment Description common_words
0 [AppleOS X, postgres, Apache, tomcat] [Apache, Commons, Base32, decoding, invalid, rejecting, ., via, valid, .] [apache]