pythonpandas

How to find rows that differ by only one column in pandas?


I have a dataframe, with three columns. I have grouped them based on two of the 3 columns. Now I need to find only those rows where the two columns word1,word2 are same but the column Tag,the third column, is different.

This something like I need to find those columns, where for the same word1 and word2 we have different labels. But I am not able to filter the dataFrame based on the groupby construct shown below

newComps.groupby(['word1','word2']).count()

enter image description here

Here it wil lbe helpful if I can see only the ones with same word1,word2 but with a different Tag, rather than all the entries. I have tried with calling the above code inside [], as we use to filter the data, but to no avail

Ideally I should see only

A,gawam, A1
A,gawam,BS1
A,gawaH, T1
A, gawaH, T2

Solution

  • If you want keep and identify the duplicates use http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html

    If you rather want to drop it: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html

    look at the subset and the keep option