I have a table similar to the one below where I have a summary column with groups and an associated value separated by a colon where each separate group with its value is separated by a comma. (group1:value1,group2:value2...).
Area DTTM ID ID2 Summary
0 M 1/1/1960 A A 1:2980
1 M 3/8/2021 A B 1:2980
2 M 5/12/2024 A B 1:8732
3 M 1/1/1960 A C 1:8732
4 H 3/8/2021 B C 1:53174,9:2332
5 H 5/12/2024 B D 1:22017,9:118
6 H 1/1/1960 B A 1:4239, 6:184, 9:1243, 14:482
I am trying to separate the summary column into separate columns for each group (the number before the colons) and include the value (number after the colons) in each column for each row. How can I go about this? A sample output is show below:
Area DTTM ID ID2 Bin1 Bin6 Bin9 Bin14
0 M 1/1/1960 A A 2980 0 0 0
1 M 3/8/2021 A B 2980 0 0 0
2 M 5/12/2024 A B 8732 0 0 0
3 M 1/1/1960 A C 8732 0 0 0
4 H 3/8/2021 B C 53174 0 2332 0
5 H 5/12/2024 B D 22017 0 118 0
6 H 1/1/1960 B A 4239 184 1243 482
IIUC, extractall
and pivot
, then join
:
out = df.join(df['Summary'].str.extractall(r'(\d+):(\d+)')
.astype(int)
.droplevel(1).pivot(columns=0, values=1)
.fillna(0).astype(int)
.add_prefix('Bin')
).drop(columns='Summary')
Output:
Area DTTM ID ID2 Bin1 Bin6 Bin9 Bin14
0 M 1/1/1960 A A 2980 0 0 0
1 M 3/8/2021 A B 2980 0 0 0
2 M 5/12/2024 A B 8732 0 0 0
3 M 1/1/1960 A C 8732 0 0 0
4 H 3/8/2021 B C 53174 0 2332 0
5 H 5/12/2024 B D 22017 0 118 0
6 H 1/1/1960 B A 4239 184 1243 482