pythonpandas

How can I change the shape of the dataframe to have two headers when I have duplicated values?


this is my df:

       symbol  year_bin        metric  value  row
0  USA500.IDX    2025-1  total_trades  32.00    0
1      GBPUSD    2025-1  total_trades  11.00    0
2      GBPUSD    2025-1  total_trades  14.00    1
3      AUDUSD    2025-1  total_trades   5.00    0
4      AUDUSD    2025-1  total_trades  43.00    1
5  USA500.IDX    2025-1      win_rate  31.25    0
6      GBPUSD    2025-1      win_rate  27.27    0
7      GBPUSD    2025-1      win_rate  21.43    1
8      AUDUSD    2025-1      win_rate   0.00    0
9      AUDUSD    2025-1      win_rate  25.58    1

I want to change the shape to this

                    2025-1
symbol         total_trades           win_rate
Usa500.idx           32                   31.25 . . .

this is what I have tried

df_pivot = df_melted.pivot(
    index=['symbol', 'year_bin'],
    columns='metric',
    values='value'
)

df_pivot = df_pivot.reset_index()

but it gives me an error:

ValueError: Index contains duplicate entries, cannot reshape

Solution

  • IIRC, I assume you meant to have a MultiIndex column, which is what you wanted (i.e., two header rows). Use pivot_table() instead of pivot()because unlike pivot(), which expects unique combinations of index + columns, pivot_table() allows aggregation (e.g., sum, mean, etc.).

    import pandas as pd
    
    # Sample data
    data = {
        'symbol': ['USA500.IDX', 'GBPUSD', 'GBPUSD', 'AUDUSD', 'AUDUSD',
                   'USA500.IDX', 'GBPUSD', 'GBPUSD', 'AUDUSD', 'AUDUSD'],
        'year_bin': ['2025-1'] * 10,
        'metric': ['total_trades'] * 5 + ['win_rate'] * 5,
        'value': [32, 11, 14, 5, 43, 31.25, 27.27, 21.43, 0.0, 25.58],
        'row': [0, 0, 1, 0, 1, 0, 0, 1, 0, 1]
    }
    
    # Create DataFrame
    df = pd.DataFrame(data)
    
    # Pivot table with aggregation (sum in this case)
    df_pivot = df.pivot_table(
        index='symbol',
        columns=['year_bin', 'metric'],
        values='value',
        aggfunc='sum'  # or 'mean', 'first', etc., depending on your need
    )
    
    # Optional: sort columns for cleaner view
    df_pivot = df_pivot.sort_index(axis=1, level=0)
    
    # Show the result
    print(df_pivot.to_markdown(tablefmt="grid"))
    
    +------------+------------------------------+--------------------------+
    | symbol     |   ('2025-1', 'total_trades') |   ('2025-1', 'win_rate') |
    +============+==============================+==========================+
    | AUDUSD     |                           48 |                    25.58 |
    +------------+------------------------------+--------------------------+
    | GBPUSD     |                           25 |                    48.7  |
    +------------+------------------------------+--------------------------+
    | USA500.IDX |                           32 |                    31.25 |
    +------------+------------------------------+--------------------------+