pythonpandasdataframeduplicatesdrop

how to remove duplicates from this nested dataframe


I have a dataframe as below and I want remove the duplicates and want the out put as mentioned below. Tried few things but not working as expected. New to pandas.

import pandas as pd
# Sample DataFrame
data = {
"some_id": "xxx",
"some_email": "abc.xyz@somedomain.com",
"This is Sample": [
  {
   "a": "22",
   "b": "Y",
   "c": "33",
   "d": "x"
  },
  {
   "a": "44",
   "b": "N",
   "c": "55",
   "d": "Y"
  },
  {
   "a": "22",
   "b": "Y",
   "c": "33",
   "d": "x"
  },
  {
   "a": "44",
   "b": "N",
   "c": "55",
   "d": "Y"
  },
  {
   "a": "22",
   "b": "Y",
   "c": "33",
   "d": "x"
  },
  {
   "a": "44",
   "b": "N",
   "c": "55",
   "d": "Y"
  }
]
}

df = pd.DataFrame(data)
print(df)

The output is 
  some_id              some_email                              This is Sample
0     xxx  abc.xyz@somedomain.com  {'a': '22', 'b': 'Y', 'c': '33', 'd': 'x'}
1     xxx  abc.xyz@somedomain.com  {'a': '44', 'b': 'N', 'c': '55', 'd': 'Y'}
2     xxx  abc.xyz@somedomain.com  {'a': '22', 'b': 'Y', 'c': '33', 'd': 'x'}
3     xxx  abc.xyz@somedomain.com  {'a': '44', 'b': 'N', 'c': '55', 'd': 'Y'}
4     xxx  abc.xyz@somedomain.com  {'a': '22', 'b': 'Y', 'c': '33', 'd': 'x'}
5     xxx  abc.xyz@somedomain.com  {'a': '44', 'b': 'N', 'c': '55', 'd': 'Y'}

I want to remove duplicates and the output should look like 
  some_id              some_email                              This is Sample
0     xxx  abc.xyz@somedomain.com  {'a': '22', 'b': 'Y', 'c': '33', 'd': 'x'}
1     xxx  abc.xyz@somedomain.com  {'a': '44', 'b': 'N', 'c': '55', 'd': 'Y'}

How can this be achieved? I tried multiple ways some times it fails with unhashable dict. I have pretty big nested data frame like this. I am using pandas dataframe and python. New to this technology


Solution

  • The issue you're encountering (e.g., unhashable type: 'dict') happens because dictionaries are mutable and unhashable, so drop_duplicates() doesn't work directly on them.

    To deduplicate rows where one of the columns contains dictionaries, you can:

    1. Convert dictionaries to strings, use drop_duplicates(), then

    2. Convert the strings back to dictionaries (if needed).

    Here’s a clean and simple way to achieve your desired output:

    https://code.livegap.com/?st=a50pbcrjkjk