pythonpandasdataframepython-itertoolspandas-explode

itertools.product in dataframe


Inputs:

arr1  = ["A","B"]
arr2   = [[1,2],[3,4,5]]

Expected output:

short_list long_list
0 A 1
1 A 2
2 B 3
3 B 4
4 B 5

Current output:

short_list long_list
0 A [1, 2]
1 A [3, 4, 5]
2 B [1, 2]
3 B [3, 4, 5]

Current Code (using itertools):

import pandas as pd
from  itertools import product

def custom_product(arr1, arr2):
    expand_short_list = [[a1]*len(a2) for a1, a2 in zip(arr1,arr2)]
    return [[a1,a2] for a1, a2 in zip(sum(expand_short_list,[]),sum(arr2,[]))]

arr1  = ["A","B"]
arr2   = [[1,2],[3,4,5]]

df2 = pd.DataFrame(data = product(arr1,arr2),columns=["short_list", "long_list"])

Alternative code using nested list comprehensions to get the desired output:

import pandas as pd

def custom_product(arr1, arr2):
    expand_short_list = [[a1]*len(a2) for a1, a2 in zip(arr1,arr2)]
    return [[a1,a2] for a1, a2 in zip(sum(expand_short_list,[]),sum(arr2,[]))]

arr1  = ["A","B"]
arr2   = [[1,2],[3,4,5]]

df1 = pd.DataFrame(data = custom_product(arr1, arr2),columns=["short_list", "long_list"])

Question:

I'm wondering how could I achieve the desired output using itertools?


Solution

  • IIUC use DataFrame contructor with DataFrame.explode:

    arr1  = ["A","B"]
    arr2   = [[1,2],[3,4,5]]
    
    df = (pd.DataFrame({'short_list':arr1, 'long_list':arr2})
            .explode('long_list')
            .reset_index(drop=True))
    print (df)
      short_list long_list
    0          A         1
    1          A         2
    2          B         3
    3          B         4
    4          B         5
    

    Another idea is use flattening zipped arrays to list of tuples and pass to DataFrame constructor:

    df = pd.DataFrame([(a, x) for a, b in zip(arr1, arr2) for x in b],
                      columns=['short_list','long_list'])
    print (df)
      short_list  long_list
    0          A          1
    1          A          2
    2          B          3
    3          B          4
    4          B          5