pythonpandasdataframesplitgrouping

Separating a Dataframe Column by Multiple Delimiters Into New Columns Using Pandas


I have a table similar to the one below where I have a summary column with groups and an associated value separated by a colon where each separate group with its value is separated by a comma. (group1:value1,group2:value2...).

  Area       DTTM ID ID2                        Summary
0    M   1/1/1960  A   A                         1:2980
1    M   3/8/2021  A   B                         1:2980
2    M  5/12/2024  A   B                         1:8732
3    M   1/1/1960  A   C                         1:8732
4    H   3/8/2021  B   C                 1:53174,9:2332
5    H  5/12/2024  B   D                  1:22017,9:118
6    H   1/1/1960  B   A  1:4239, 6:184, 9:1243, 14:482

I am trying to separate the summary column into separate columns for each group (the number before the colons) and include the value (number after the colons) in each column for each row. How can I go about this? A sample output is show below:

  Area       DTTM ID ID2   Bin1  Bin6  Bin9  Bin14
0    M   1/1/1960  A   A   2980     0     0      0
1    M   3/8/2021  A   B   2980     0     0      0
2    M  5/12/2024  A   B   8732     0     0      0
3    M   1/1/1960  A   C   8732     0     0      0
4    H   3/8/2021  B   C  53174     0  2332      0
5    H  5/12/2024  B   D  22017     0   118      0
6    H   1/1/1960  B   A   4239   184  1243    482

Solution

  • IIUC, extractall and pivot, then join:

    out = df.join(df['Summary'].str.extractall(r'(\d+):(\d+)')
                  .astype(int)
                  .droplevel(1).pivot(columns=0, values=1)
                  .fillna(0).astype(int)
                  .add_prefix('Bin')
                 ).drop(columns='Summary')
    

    Output:

      Area       DTTM ID ID2   Bin1  Bin6  Bin9  Bin14
    0    M   1/1/1960  A   A   2980     0     0      0
    1    M   3/8/2021  A   B   2980     0     0      0
    2    M  5/12/2024  A   B   8732     0     0      0
    3    M   1/1/1960  A   C   8732     0     0      0
    4    H   3/8/2021  B   C  53174     0  2332      0
    5    H  5/12/2024  B   D  22017     0   118      0
    6    H   1/1/1960  B   A   4239   184  1243    482