pythonpandasdataframe

Pandas - How to replace string with zero values in a DataFrame series?


I'm importing some csv data into a Pandas DataFrame (in Python). One series is meant to be all numerical values. However, it also contains some spurious "$-" elements represented as strings. These have been left over from previous formatting. If I just import the series, Pandas reports it as a series of 'object'.

What's the best way to replace these "$-" strings with zeros? Or more generally, how can I replace all the strings in a series (which is predominantly numerical), with a numerical value, and convert the series to a floating point type?


Solution

  • Use Series.str.replace and Series.astype

    df = pd.Series(['2$-32$-4','123$-12','00123','44'])
    df.str.replace(r'\$-','0').astype(float)
    
    0    203204
    1    123012
    2       123
    3        44
    dtype: float64