pythonnumpysortinggroup-bysorteddictionary

How to sort multiple columns values from min to max, and pivot them using python?


I have a dataframe with datetime objects in columns 'start' . I want to sort these dates into new columns : evry time ID start to a new location with order

df = pd.DataFrame(data={'ID':['a1','a2','a1','a1','a2','a2'],
                       'location':['bali','mosta','road','joha','alabama','vinice'], 
                       'start':[pd.to_datetime('2022-11-18 16:28:35'),
                                pd.to_datetime('2022-11-18 17:28:35'),
                                pd.to_datetime('2022-11-19 16:28:35'),
                                pd.to_datetime('2022-11-19 17:28:35'),
                                pd.to_datetime('2022-11-19 17:18:35'),
                                pd.to_datetime('2022-11-19 17:18:35')
                               ]}) 

ID location               start
0  a1     bali 2022-11-18 16:28:35
1  a2    mosta 2022-11-18 17:28:35
2  a1     road 2022-11-19 16:28:35
3  a1     joha 2022-11-19 17:28:35
4  a2  alabama 2022-11-19 17:18:35
5  a2   vinice 2022-11-19 17:18:35

expected result :

new_data = pd.DataFrame(data={'ID':['a1','a2',],
      'location1':['bali','mosta'],
       'start1':[pd.to_datetime('2022-11-18 16:28:35'),pd.to_datetime('2022-11-18 17:28:35') ],
        'location2':['road','alabama'],
        'start2': [pd.to_datetime('2022-11-19 16:28:35'),pd.to_datetime('2022-11-19 17:18:35')],
      'location3':['joha','vinice'],
     'start3': [pd.to_datetime('2022-11-19 17:28:35'),pd.to_datetime('2022-11-19 17:18:35')],
                          }) 
ID location_1             start_1 location_2             start_2 location_3             start_3
0  a1       bali 2022-11-18 16:28:35       road 2022-11-19 16:28:35       joha 2022-11-19 17:28:35
1  a2      mosta 2022-11-18 17:28:35    alabama 2022-11-19 17:18:35     vinice 2022-11-19 17:18:35

Solution

  • Try:

    df["tmp"] = df.groupby("ID").cumcount() + 1
    df = df.pivot(index="ID", columns="tmp")
    df.columns = [f"{t}_{n}" for t, n in df.columns]
    df = df[sorted(df, key=lambda k: ((int((i := k.split("_"))[1])), i[0]))]
    print(df.reset_index())
    

    Prints:

       ID location_1             start_1 location_2             start_2 location_3             start_3
    0  a1       bali 2022-11-18 16:28:35       road 2022-11-19 16:28:35       joha 2022-11-19 17:28:35
    1  a2      mosta 2022-11-18 17:28:35    alabama 2022-11-19 17:18:35     vinice 2022-11-19 17:18:35