pandasdataframesplittuplestolist

How to split tuples in all columns of a dataframe


The columns in my dataframe contain tuples and empty cells. The number of columns in the dataframe is dynamic and the columns have no lables.

          **0                1               2**
**0**    name,score,ID    name,score,ID      None
**1**    name,score,ID       None            None
**2**        None            None         name,score,ID

I want to split the tuples of all columns in separate columns, i.e:

      **Name0    Score0    ID0    Name1   Score1    ID1    Name2   Score2   ID2**
**0**    name    score     ID     name    score     ID      None    None    None
**1**    name    score     ID     None    None     None     None    None    None
**2**    None    None     None    None    None     None     name    score     ID

I found the following:

df1[['Name', 'Score', "ID"]] = pd.DataFrame(df1[0].tolist(), index=df1.index)

which works basically, however it just splits the first columns tuples into separate columns (--> df1[0]). I cant find how to apply this to all columns tuples.

Any help appreciated!


Solution

  • Considering the following toy dataframe:

    import pandas as pd
    
    df = pd.DataFrame(
        {
            0: {
                0: None,
                1: None,
                2: None,
                3: ("bartenbach gmbh rinner strasse 14 aldrans", 96, 1050),
                4: (
                    "ait austrian institute of technology gmbh giefinggasse 4 wien",
                    70,
                    537,
                ),
            },
            1: {0: None, 1: None, 2: None, 3: None, 4: None},
            2: {0: None, 1: None, 2: None, 3: None, 4: None},
        }
    )
    
    print(df)
    # Outputs
                                                       0     1     2
    0                                               None  None  None
    1                                               None  None  None
    2                                               None  None  None
    3  (bartenbach gmbh rinner strasse 14 aldrans, 96...  None  None
    4  (ait austrian institute of technology gmbh gie...  None  None
    

    You could iterate on each column, then each value, split the string and populate a new dataframe, like this:

    new_df = pd.DataFrame()
    
    for col_num, series in df.iteritems():
        for i, value in enumerate(series.values):
            try:
                name, score, id_num = value
                new_df.loc[i, f"Name{col_num}"] = name
                new_df.loc[i, f"Score{col_num}"] = score
                new_df.loc[i, f"ID{col_num}"] = id_num
            except TypeError:
                continue
    new_df = new_df.reset_index(drop=True)
    
    print(new_df)
    # Outputs
                                                   Name0  Score0     ID0
    0          bartenbach gmbh rinner strasse 14 aldrans    96.0  1050.0
    1  ait austrian institute of technology gmbh gief...    70.0   537.0