Finding offending entries when pivot fails with a ValueError: Index contains duplicate entries, cannot reshape

I have a dataframe with 3 columns and datatypes: Datetime (Datetime dtype), Area (string), Value (float).

I want to pivot it so I get separate columns for each unique entry in Area.

df_pivot = pd.pivot(df,index='Datetime',columns='Area','values='Value')

Now I get the following error

ValueError: Index contains duplicate entries, cannot reshape.

I drop duplicates without subsetting and try again but get the same result.

I try to find the offending entries with no success:

duplicates = df[df.duplicated(subset=['Datetime','Area','Value'], keep=False)]

This returns an empty dataframe

So I try resetting the index and repeating subsetting with the nex index:

duplicates = df[df.duplicated(subset=['Datetime','Area','Value','index'], keep=False)]

I get the same result, an empty dataframe and an unpivottable df.

Is there some other way to find the offending entries?

Solution

You need find duplicates per Datetime and Area, not for all 3 columns, because new index and new column is created by Datetime and Area columns:

df = pd.DataFrame(
    {
    "Datetime" : [1,1,2,3,3],
    "Value" : [1,2,3,4,5],
    "Area" : ["D", "D", "D", "E", "E"]
    }
 )

Get duplicates per both columns:

duplicates = df[df.duplicated(subset=['Datetime','Area'], keep=False)]
print (duplicates)
   Datetime  Value Area
0         1      1    D
1         1      2    D
3         3      4    E
4         3      5    E

Rmeove duplicates per both columns, keep first values is possible by DataFrame.drop_duplicates:

print (df.drop_duplicates(subset=['Datetime','Area']))
   Datetime  Value Area
0         1      1    D
2         2      3    D
3         3      4    E

All toghether:

df_pivot = (df.drop_duplicates(subset=['Datetime','Area'])
              .pivot(index='Datetime',columns='Area',values='Value'))
print (df_pivot)
Area        D    E
Datetime          
1         1.0  NaN
2         3.0  NaN
3         NaN  4.0

Another idea is aggregate duplicated values, e.g. by sum:

df_pivot = df.pivot_table(index='Datetime',columns='Area',values='Value', aggfunc='sum')
print (df_pivot)
Area        D    E
Datetime          
1         3.0  NaN
2         3.0  NaN
3         NaN  9.0