I have a dataframe as below and I want remove the duplicates and want the out put as mentioned below. Tried few things but not working as expected. New to pandas.
import pandas as pd
# Sample DataFrame
data = {
"some_id": "xxx",
"some_email": "abc.xyz@somedomain.com",
"This is Sample": [
{
"a": "22",
"b": "Y",
"c": "33",
"d": "x"
},
{
"a": "44",
"b": "N",
"c": "55",
"d": "Y"
},
{
"a": "22",
"b": "Y",
"c": "33",
"d": "x"
},
{
"a": "44",
"b": "N",
"c": "55",
"d": "Y"
},
{
"a": "22",
"b": "Y",
"c": "33",
"d": "x"
},
{
"a": "44",
"b": "N",
"c": "55",
"d": "Y"
}
]
}
df = pd.DataFrame(data)
print(df)
The output is
some_id some_email This is Sample
0 xxx abc.xyz@somedomain.com {'a': '22', 'b': 'Y', 'c': '33', 'd': 'x'}
1 xxx abc.xyz@somedomain.com {'a': '44', 'b': 'N', 'c': '55', 'd': 'Y'}
2 xxx abc.xyz@somedomain.com {'a': '22', 'b': 'Y', 'c': '33', 'd': 'x'}
3 xxx abc.xyz@somedomain.com {'a': '44', 'b': 'N', 'c': '55', 'd': 'Y'}
4 xxx abc.xyz@somedomain.com {'a': '22', 'b': 'Y', 'c': '33', 'd': 'x'}
5 xxx abc.xyz@somedomain.com {'a': '44', 'b': 'N', 'c': '55', 'd': 'Y'}
I want to remove duplicates and the output should look like
some_id some_email This is Sample
0 xxx abc.xyz@somedomain.com {'a': '22', 'b': 'Y', 'c': '33', 'd': 'x'}
1 xxx abc.xyz@somedomain.com {'a': '44', 'b': 'N', 'c': '55', 'd': 'Y'}
How can this be achieved? I tried multiple ways some times it fails with unhashable dict. I have pretty big nested data frame like this. I am using pandas dataframe and python. New to this technology
The issue you're encountering (e.g., unhashable type: 'dict') happens because dictionaries are mutable and unhashable, so drop_duplicates()
doesn't work directly on them.
To deduplicate rows where one of the columns contains dictionaries, you can:
Convert dictionaries to strings, use drop_duplicates()
, then
Convert the strings back to dictionaries (if needed).
Here’s a clean and simple way to achieve your desired output: