pandaslistseriespandas-explode

Exploding nested lists using Pandas Series keeps failing


not used pandas explode before. I got the gist of the pd.explode but for value lists where selective cols have nested lists I heard that pd.Series.explode is useful. However, i keep getting : "KeyError: "None of ['city'] are in the columns". Yet 'city' is defined in the keys:

keys = ["city", "temp"]
values = [["chicago","london","berlin"], [[32,30,28],[39,40,25],[33,34,35]]]
df = pd.DataFrame({"keys":keys,"values":values})
df2 = df.set_index(['city']).apply(pd.Series.explode).reset_index()

desired output is:

city / temp
chicago / 32
chicago / 30
chicago / 28

etc.

I would appreciate an expert weighing in as to why this throws an error, and a fix, thank you.


Solution

  • The problem comes from how you define df:

    df = pd.DataFrame({"keys":keys,"values":values})
    

    This actually gives you the following dataframe:

       keys                                      values
    0  city                   [chicago, london, berlin]
    1  temp  [[32, 30, 28], [39, 40, 25], [33, 34, 35]]
    

    You probably meant:

    df = pd.DataFrame(dict(zip(keys, values)))
    

    Which gives you:

          city          temp
    0  chicago  [32, 30, 28]
    1   london  [39, 40, 25]
    2   berlin  [33, 34, 35]
    

    You can then use explode:

    print(df.explode('temp'))
    

    Output:

          city temp
    0  chicago   32
    0  chicago   30
    0  chicago   28
    1   london   39
    1   london   40
    1   london   25
    2   berlin   33
    2   berlin   34
    2   berlin   35