I have a dataframe with 3 columns and datatypes: Datetime (Datetime dtype), Area (string), Value (float).
I want to pivot it so I get separate columns for each unique entry in Area.
df_pivot = pd.pivot(df,index='Datetime',columns='Area','values='Value')
Now I get the following error
ValueError: Index contains duplicate entries, cannot reshape.
I drop duplicates without subsetting and try again but get the same result.
I try to find the offending entries with no success:
duplicates = df[df.duplicated(subset=['Datetime','Area','Value'], keep=False)]
This returns an empty dataframe
So I try resetting the index and repeating subsetting with the nex index:
duplicates = df[df.duplicated(subset=['Datetime','Area','Value','index'], keep=False)]
I get the same result, an empty dataframe and an unpivottable df.
Is there some other way to find the offending entries?
You need find duplicates per Datetime
and Area
, not for all 3 columns, because new index and new column is created by Datetime
and Area
columns:
df = pd.DataFrame(
{
"Datetime" : [1,1,2,3,3],
"Value" : [1,2,3,4,5],
"Area" : ["D", "D", "D", "E", "E"]
}
)
Get duplicates per both columns:
duplicates = df[df.duplicated(subset=['Datetime','Area'], keep=False)]
print (duplicates)
Datetime Value Area
0 1 1 D
1 1 2 D
3 3 4 E
4 3 5 E
Rmeove duplicates per both columns, keep first values is possible by DataFrame.drop_duplicates
:
print (df.drop_duplicates(subset=['Datetime','Area']))
Datetime Value Area
0 1 1 D
2 2 3 D
3 3 4 E
All toghether:
df_pivot = (df.drop_duplicates(subset=['Datetime','Area'])
.pivot(index='Datetime',columns='Area',values='Value'))
print (df_pivot)
Area D E
Datetime
1 1.0 NaN
2 3.0 NaN
3 NaN 4.0
Another idea is aggregate duplicated values, e.g. by sum
:
df_pivot = df.pivot_table(index='Datetime',columns='Area',values='Value', aggfunc='sum')
print (df_pivot)
Area D E
Datetime
1 3.0 NaN
2 3.0 NaN
3 NaN 9.0