[SOLVED] How to correctly extract Numbers from String using regex in Python?

How to correctly extract Numbers from String using regex in Python?

I am trying to extract Numbers from the string only where it ends with Numbers or Decimals

df = pd.DataFrame({'Names': ["Absolute Neutrophil Count","Absolute Lymphocyte Count 2.9",
                       "Absolute Neutrophil Count 10.2","ESR (Modified Westergren) 8",
                        "Free Triiodothyronine (FT3) 3.59",
                        "Free Triiodothyronine FT4 4.53"]})

df


                         Names
0   Absolute Neutrophil Count
1   Absolute Lymphocyte Count 2.9
2   Absolute Neutrophil Count 10.2
3   ESR (Modified Westergren) 8
4   Free Triiodothyronine (FT3) 3.59
5   Free Triiodothyronine FT4 4.53

Desired Extraction Results:

0  Missing/None
1  2.9
2  10.2
3  8
4  3.59
5  4.53

I was trying below code but that is not giving the desired results.

df.iloc[:,0].str.extract(r'^(.*?)\s*(\d\.?\d*)?$') #  '\d+\.\d+'

                           0      1
0   Absolute Neutrophil Count     NaN
1   Absolute Lymphocyte Count     2.9
2   Absolute Neutrophil Count 1   0.2
3   ESR (Modified Westergren)     8
4   Free Triiodothyronine (FT3)   3.59
5   Free Triiodothyronine FT4     4.53

Please use dataframe form of structure in Answer and .extract otherwise sometimes answers here with re and strings work but when I try to apply them on df then it becomes something else.

Solution

The following regex pattern seems to be working here:

(\d+(?:\.\d+)?)$

This matches an optional ending integer/decimal. The updated Pandas code would look like:

df.iloc[:,0].str.extract(r'(\d+(?:\.\d+)?)?$')

Here is an explanation of the regex pattern being used:

( capture
- \d+ an integer
- (?:\.\d+)? followed by optional decimal component
) close capture
$ end of the string