Can I use the split() function to transfer this series:
s = pd.Series(['adh', 'bei', np.nan, 'cfj'])
To either one of those outputs:
s = pd.DataFrame({
'A': ['a', 'b', 'c'],
'B': ['d', 'e', 'f'],
'C': ['h', 'i', 'j']
})
or:
s = pd.DataFrame({
'A': ['a', 'b', np.nan, 'c'],
'B': ['d', 'e', np.nan, 'f'],
'C': ['h', 'i', np.nan, 'j']
})
You can str.split
with a regex matching a character before and after the split point:
s.str.split('(?<=.)(?=.)', expand=True)
Output:
0 1 2
0 a d h
1 b e i
2 NaN NaN NaN
3 c f j
To avoid the NaNs, add a dropna
:
s.dropna().str.split('(?<=.)(?=.)', expand=True)
Output:
0 1 2
0 a d h
1 b e i
3 c f j
(?<=.) # match a character before
# match empty string
(?=.) # match a character after
Alternatively, you could also convert each string to list and pass to the DataFrame
constructor:
out = pd.DataFrame(map(list, s.dropna()))