I am trying to extract Numbers from the string only where it ends with Numbers or Decimals
df = pd.DataFrame({'Names': ["Absolute Neutrophil Count","Absolute Lymphocyte Count 2.9",
"Absolute Neutrophil Count 10.2","ESR (Modified Westergren) 8",
"Free Triiodothyronine (FT3) 3.59",
"Free Triiodothyronine FT4 4.53"]})
df
Names
0 Absolute Neutrophil Count
1 Absolute Lymphocyte Count 2.9
2 Absolute Neutrophil Count 10.2
3 ESR (Modified Westergren) 8
4 Free Triiodothyronine (FT3) 3.59
5 Free Triiodothyronine FT4 4.53
Desired Extraction Results:
0 Missing/None
1 2.9
2 10.2
3 8
4 3.59
5 4.53
I was trying below code but that is not giving the desired results.
df.iloc[:,0].str.extract(r'^(.*?)\s*(\d\.?\d*)?$') # '\d+\.\d+'
0 1
0 Absolute Neutrophil Count NaN
1 Absolute Lymphocyte Count 2.9
2 Absolute Neutrophil Count 1 0.2
3 ESR (Modified Westergren) 8
4 Free Triiodothyronine (FT3) 3.59
5 Free Triiodothyronine FT4 4.53
Please use dataframe form of structure in Answer and .extract
otherwise sometimes answers here with re
and strings
work but when I try to apply them on df
then it becomes something else.
The following regex pattern seems to be working here:
(\d+(?:\.\d+)?)$
This matches an optional ending integer/decimal. The updated Pandas code would look like:
df.iloc[:,0].str.extract(r'(\d+(?:\.\d+)?)?$')
Here is an explanation of the regex pattern being used:
(
capture
\d+
an integer(?:\.\d+)?
followed by optional decimal component)
close capture$
end of the string