pythonpandaspivot

How do you create a pivot table group by Date and perfom calculation on 2 values in pandas


I have this data frame:

df

Node      Interface    Speed   Band_In  carrier     Date
Server1   wan1         100     80       ATT         2024-06-01
Server1   wan2         100     60       Sprint      2024-06-01
Server1   wan3         100     96       Verizon     2024-06-01
Server2   wan1         100     80       ATT         2024-06-01
Server2   wan2         100     60       ATT         2024-06-01
Server2   wan3         100     96       ATT         2024-06-01
Server3   wan1         100     80       ATT         2024-06-01
Server3   wan2         100     60       ATT         2024-06-01
Server3   wan3         100     96       ATT         2024-06-01
Server4   wan1         100     80       ATT         2024-06-01
Server4   wan2         100     60       ATT         2024-06-01
Server5   Int3         100     96       Verizon     2024-06-01
Server1   wan1         100     30       ATT         2024-06-10
Server1   wan2         100     30       Sprint      2024-06-10
Server1   wan3         100     15       Verizon     2024-06-10
Server2   wan1         100     80       ATT         2024-06-10
Server2   wan2         100     60       ATT         2024-06-10
Server2   wan3         100     96       ATT         2024-06-10
Server3   wan1         100     80       ATT         2024-06-10
Server3   wan2         100     60       ATT         2024-06-10
Server3   wan3         100     96       ATT         2024-06-10
Server4   wan1         100     80       ATT         2024-06-10
Server4   wan2         100     60       ATT         2024-06-10
Server5   Int3         100     96       Verizon     2024-06-10

I need this data frame to be re-org where each uniq date needs to on its own column and used interface speed (Band_In/Speed)*100 calculated by each Node, Interface, Carier and Date (Date needs to be Month Name-Day: e.g. June-01, June-10)

In needs to be something like this:

df1:

Node      Interface    Speed   Band_In  carrier     1-Jun            10-Jun
Server1   wan1         100     80       ATT         80               30
Server1   wan2         100     60       Sprint      60               30
Server1   wan3         100     96       Verizon     96               15

I tried this:

 df1=df.pivot(index=['Node', 'Interface', 'Speed', 'Band_In', 'carrier'], columns='Date', values='Speed'/'Interface'*100).fillna('').reset_index()

I get length of passed values i 11,456, index implies 5.

Any ideas what I am doing wrong here?


Solution

  • You can calculate used interface speed before pivoting the table, then use pivot_table:

    df['is'] = df['Band_In'] / df['Speed'] * 100
    df = df.pivot_table(index=['Node', 'Interface', 'carrier'], columns='Date', values='is').reset_index()
    
    Date     Node Interface  carrier  2024-06-01  2024-06-10
    0     Server1      wan1      ATT        80.0        30.0
    1     Server1      wan2   Sprint        60.0        30.0
    2     Server1      wan3  Verizon        96.0        15.0
    3     Server2      wan1      ATT        80.0        80.0
    4     Server2      wan2      ATT        60.0        60.0
    5     Server2      wan3      ATT        96.0        96.0
    6     Server3      wan1      ATT        80.0        80.0
    7     Server3      wan2      ATT        60.0        60.0
    8     Server3      wan3      ATT        96.0        96.0
    9     Server4      wan1      ATT        80.0        80.0
    10    Server4      wan2      ATT        60.0        60.0
    11    Server5      Int3  Verizon        96.0        96.0