Inputs:
arr1 = ["A","B"]
arr2 = [[1,2],[3,4,5]]
Expected output:
short_list | long_list | |
---|---|---|
0 | A | 1 |
1 | A | 2 |
2 | B | 3 |
3 | B | 4 |
4 | B | 5 |
Current output:
short_list | long_list | |
---|---|---|
0 | A | [1, 2] |
1 | A | [3, 4, 5] |
2 | B | [1, 2] |
3 | B | [3, 4, 5] |
Current Code (using itertools
):
import pandas as pd
from itertools import product
def custom_product(arr1, arr2):
expand_short_list = [[a1]*len(a2) for a1, a2 in zip(arr1,arr2)]
return [[a1,a2] for a1, a2 in zip(sum(expand_short_list,[]),sum(arr2,[]))]
arr1 = ["A","B"]
arr2 = [[1,2],[3,4,5]]
df2 = pd.DataFrame(data = product(arr1,arr2),columns=["short_list", "long_list"])
Alternative code using nested list comprehensions to get the desired output:
import pandas as pd
def custom_product(arr1, arr2):
expand_short_list = [[a1]*len(a2) for a1, a2 in zip(arr1,arr2)]
return [[a1,a2] for a1, a2 in zip(sum(expand_short_list,[]),sum(arr2,[]))]
arr1 = ["A","B"]
arr2 = [[1,2],[3,4,5]]
df1 = pd.DataFrame(data = custom_product(arr1, arr2),columns=["short_list", "long_list"])
Question:
I'm wondering how could I achieve the desired output using itertools
?
IIUC use DataFrame
contructor with DataFrame.explode
:
arr1 = ["A","B"]
arr2 = [[1,2],[3,4,5]]
df = (pd.DataFrame({'short_list':arr1, 'long_list':arr2})
.explode('long_list')
.reset_index(drop=True))
print (df)
short_list long_list
0 A 1
1 A 2
2 B 3
3 B 4
4 B 5
Another idea is use flattening zipped arrays to list of tuples and pass to DataFrame
constructor:
df = pd.DataFrame([(a, x) for a, b in zip(arr1, arr2) for x in b],
columns=['short_list','long_list'])
print (df)
short_list long_list
0 A 1
1 A 2
2 B 3
3 B 4
4 B 5