pandassplit

Split String in Pandas Series


Can I use the split() function to transfer this series:

s = pd.Series(['adh', 'bei', np.nan, 'cfj'])

To either one of those outputs:

s = pd.DataFrame({
    'A': ['a', 'b', 'c'],
    'B': ['d', 'e', 'f'],
    'C': ['h', 'i', 'j']
})

or:

s = pd.DataFrame({
    'A': ['a', 'b', np.nan, 'c'],
    'B': ['d', 'e', np.nan, 'f'],
    'C': ['h', 'i', np.nan, 'j']
})

Solution

  • You can str.split with a regex matching a character before and after the split point:

    s.str.split('(?<=.)(?=.)', expand=True)
    

    Output:

         0    1    2
    0    a    d    h
    1    b    e    i
    2  NaN  NaN  NaN
    3    c    f    j
    

    To avoid the NaNs, add a dropna:

    s.dropna().str.split('(?<=.)(?=.)', expand=True)
    

    Output:

       0  1  2
    0  a  d  h
    1  b  e  i
    3  c  f  j
    
    Regex

    regex demo

    (?<=.)  # match a character before
            # match empty string
    (?=.)   # match a character after
    

    Alternatively, you could also convert each string to list and pass to the DataFrame constructor:

    out = pd.DataFrame(map(list, s.dropna()))