I have a dataframe, with three columns. I have grouped them based on two of the 3 columns. Now I need to find only those rows where the two columns word1,word2
are same but the column Tag
,the third column, is different.
This something like I need to find those columns, where for the same word1 and word2
we have different labels. But I am not able to filter the dataFrame based on the groupby construct shown below
newComps.groupby(['word1','word2']).count()
Here it wil lbe helpful if I can see only the ones with same word1,word2 but with a different Tag, rather than all the entries. I have tried with calling the above code inside []
, as we use to filter the data, but to no avail
Ideally I should see only
A,gawam, A1
A,gawam,BS1
A,gawaH, T1
A, gawaH, T2
If you want keep and identify the duplicates use http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html
If you rather want to drop it: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html
look at the subset
and the keep
option