How do I change this:
Date | URL | Description | Category |
2022-06-17 14:24:52 | /XYBkLO | public | A |
2022-06-17 14:24:52 | /XYBkLO | public | B |
2022-06-17 14:24:52 | /XYBkLO | public | C |
2022-06-17 14:25:05 | /ZWrTVu | public | A |
2022-06-17 14:25:05 | /ZWrTVu | public | B |
2022-06-17 14:25:05 | /ZWrTVu | public | C |
To this:
Date | URL | Description | Category | Date | URL | Description | Category | Date | URL | Description | Category |
2022-06-17 14:24:52 | /XYBkLO | public | A | 2022-06-17 14:24:52 | /XYBkLO | public | B | 2022-06-17 14:24:52 | /XYBkLO | public | C |
2022-06-17 14:25:05 | /ZWrTVu | public | A | 2022-06-17 14:25:05 | /ZWrTVu | public | B | 2022-06-17 14:25:05 | /ZWrTVu | public | C |
I would like to keep everything with the same URL on one row, but I don't know how I could implement this using pandas. Is there perhaps another way or another library I should use? Could really use some help
You can try:
from itertools import cycle, count, islice
from collections import defaultdict
def fn(x):
d = defaultdict(lambda: count(1))
names = cycle(x.columns)
vals = x.values.ravel()
return pd.DataFrame(
columns=[f"{n}.{next(d[n])}" for n in islice(names, len(vals))],
x = df.groupby("URL").apply(fn).reset_index(drop=True)
Date.1 URL.1 Description.1 Category.1 Date.2 URL.2 Description.2 Category.2 Date.3 URL.3 Description.3 Category.3
0 2022-06-17 14:24:52 /XYBkLO public A 2022-06-17 14:24:52 /XYBkLO public B 2022-06-17 14:24:52 /XYBkLO public C
1 2022-06-17 14:25:05 /ZWrTVu public A 2022-06-17 14:25:05 /ZWrTVu public B 2022-06-17 14:25:05 /ZWrTVu public C