pythonpandasdataframegroup-bypandas-explode

How to cancat one column values into another column pandas?


I have a dataframe, it has two columns, one is called 'PTM_loc' and another one is called 'PTM_types'

data['PTM_loc']
0        24
1        24;30
2        22
3        19;20;66
4        16;30;50

data['PTM_typs']
0        S
1        S
2        T
3        S;T
4        S;Y;T

I would like to concat (or whatever you call this action) these columns together, and get a new column like this:

data['PTMs']
0        S24
1        S24;S30
2        T22
3        S19;T20;T66
4        S16;Y30;T50

Does anyone have any ideas?


Solution

  • You can use a list comprehension with a double zip/zip_longest:

    from itertools import zip_longest
    
    def combine(a, b):
        a = a.split(';')
        b = b.split(';')
        return ';'.join(map(''.join, zip_longest(a, b, fillvalue=a[-1])))
    
    data['PTMs')] = [combine(a,b) for a,b in zip(data['PTM_types'], data['PTM_loc'])]
    
    # or 
    # from itertools import starmap
    # data['PTMs'] = list(starmap(combine, zip(data['PTM_types'], data['PTM_loc'])))
    

    Altentatively, for a pure pandas variant (just for fun, it's likely less efficient), use ffill to fill the missing combinations:

    # either in rectangular form
    df['PTMs'] = (data['PTM_types']
     .str.split(';', expand=True).ffill(axis=1)
     +data['PTM_loc'].str.split(';', expand=True)
    ).stack().groupby(level=0).agg(';'.join)
    
    # or in long form
    l = data['PTM_loc'].str.extractall('([^;]+)')
    t = data['PTM_typs'].str.extractall('([^;]+)')
    data['PTMs'] = (t.reindex_like(l).groupby(level=0).ffill().add(l)
                     .groupby(level=0).agg(';'.join)
                    )
    

    Output:

        PTM_loc PTM_types         PTMs
    0        24         S          S24
    1     24;30         S      S24;S30
    2        22         S          S22
    3  19;20;66       S;T  S19;T20;T66
    4  16;30;50     S;Y;T  S16;Y30;T50