pythondataframecsvdtype

How to load csv column values correctly in Python?


I want to load a csv file as a dataframe in Python with pd.read_table. This file contains data from 60 parameters at different time points. The ‘Time’ column looks like this in Excel:

enter image description here

The time should actually be 39.7318, 39.75808 etc. However, when I load the file into Python, it reads the values in the time column as string. I tried adding dtype = np.float in order for the values to be read correctly;

df = pd.read_table(path, delimiter=";", dtype=np.float)

But this results in ValueError: could not convert string to float: '3.975.808'.

Is it possible to define that this column contains values with 5 decimals?


Solution

  • For a file table.txt that looks like

    Time;B
    397.318;abc
    3.975.808;efg
    

    you could try

    df = pd.read_table("table.txt", sep=";")
    df["Time"] = (
        df["Time"].str.replace(".", "", regex=False).astype("float").div(100_000)
    )
    

    to get

           Time    B
    0   3.97318  abc
    1  39.75808  efg
    

    You could also use a converter

    def to_float(string):
        return float(string.replace(".", "")) / 100_000
    
    df = pd.read_table("table.txt", sep=";", converters={"Time": to_float})
    

    but I think the first solution is more efficient.