I have imported a CSV into a pandas data frame; however, the column I need to use is freeform and in bad shape.
I need to extract the first series of numbers after the word NBU or the first series of numbers in the string. See some examples below:-
nbu 123456
NBU-123456
nbu/ 123456 blah12
123456
123456_123
All of the above should be cleaned to produce 123456. Note that the number of integers returned is based on how many are in a continual sequence; i.e., nbu12 3455 should only return 12.
I will then use something like this to fix the data:-
df['col'] = df['col'].str.
A possible solution, which works as follows: It uses str.extract
method to extract digits from the col
column of df
, where in r'(\d+)'
the parentheses denote a capturing group, with \d+
meaning one or more digits.
df['col'].str.extract(r'(\d+)')
Output:
0
0 123456
1 123456
2 123456
3 123456
4 123456