I have a column in a Pandas dataframe that contains values as follows:
111042345--
111042345
110374217dclid=CA-R3K
109202817lciz@MM10082IA
I need to extract just the first sequence of digits in each row - not all of the digits in the row. So the output would be like this:
111042345
111042345
110374217
109202817
I thought the best way to achieve that would be to split the strings by digits and return that but that would give me the unwanted digits after the non-digit characters.
Use str.extract
with regex \d
for extract digits, {,5}
means up to 5 digits and +
is for all digits:
df['first_5_digits'] = df['Col'].str.extract('(\d{,5})')
df['all_digits'] = df['Col'].str.extract('(\d+)')
print (df)
Col first_5_digits all_digits
0 111042345-- 11104 111042345
1 111042345 11104 111042345
2 110374217dclid=CA-R3K 11037 110374217
3 109202817lciz@MM10082IA 10920 109202817
Like @ Jon Clements pointed is also possible extract N values by indexing:
df['first_5_digits'] = df['Col'].str.extract('(\d+)').str[:5]