I have the following data frame:
a b x
0 id1 abc 123 tr 2
1 id2 abd1 124 tr 6
2 id3 abce 126 af 9
3 id4 abe 128 nm 12
From column b, for each item, I need to extract the substrings before the first space. Hence, I need the following result:
list_of_strings = [abc, abd1, abce, abe]
Please advise
Use a regex with ^\S+
(non-space characters anchored to the start of string) and str.extract
:
df['b'].str.extract(r'^(\S+)', expand=False)
Output:
0 abc
1 abd1
2 abce
3 abe
Name: b, dtype: object
For a list:
list_of_strings = df['b'].str.extract(r'^(\S+)', expand=False).tolist()
# ['abc', 'abd1', 'abce', 'abe']