pythonpandascsvpandas-groupbypandas-merge

Can you group multiple rows all into one row by column value with Python using pandas?


How do I change this:

Date URL Description Category
2022-06-17 14:24:52 /XYBkLO public A
2022-06-17 14:24:52 /XYBkLO public B
2022-06-17 14:24:52 /XYBkLO public C
2022-06-17 14:25:05 /ZWrTVu public A
2022-06-17 14:25:05 /ZWrTVu public B
2022-06-17 14:25:05 /ZWrTVu public C

To this:

Date URL Description Category Date URL Description Category Date URL Description Category
2022-06-17 14:24:52 /XYBkLO public A 2022-06-17 14:24:52 /XYBkLO public B 2022-06-17 14:24:52 /XYBkLO public C
2022-06-17 14:25:05 /ZWrTVu public A 2022-06-17 14:25:05 /ZWrTVu public B 2022-06-17 14:25:05 /ZWrTVu public C

I would like to keep everything with the same URL on one row, but I don't know how I could implement this using pandas. Is there perhaps another way or another library I should use? Could really use some help


Solution

  • You can try:

    from itertools import cycle, count, islice
    from collections import defaultdict
    
    
    def fn(x):
        d = defaultdict(lambda: count(1))
        names = cycle(x.columns)
        vals = x.values.ravel()
    
        return pd.DataFrame(
            [vals],
            columns=[f"{n}.{next(d[n])}" for n in islice(names, len(vals))],
        )
    
    
    x = df.groupby("URL").apply(fn).reset_index(drop=True)
    print(x)
    

    Prints:

                    Date.1    URL.1 Description.1 Category.1               Date.2    URL.2 Description.2 Category.2               Date.3    URL.3 Description.3 Category.3
    0  2022-06-17 14:24:52  /XYBkLO        public          A  2022-06-17 14:24:52  /XYBkLO        public          B  2022-06-17 14:24:52  /XYBkLO        public          C
    1  2022-06-17 14:25:05  /ZWrTVu        public          A  2022-06-17 14:25:05  /ZWrTVu        public          B  2022-06-17 14:25:05  /ZWrTVu        public          C