[SOLVED] How to load csv column values correctly in Python?

How to load csv column values correctly in Python?

I want to load a csv file as a dataframe in Python with pd.read_table. This file contains data from 60 parameters at different time points. The ‘Time’ column looks like this in Excel:

The time should actually be 39.7318, 39.75808 etc. However, when I load the file into Python, it reads the values in the time column as string. I tried adding dtype = np.float in order for the values to be read correctly;

df = pd.read_table(path, delimiter=";", dtype=np.float)

But this results in ValueError: could not convert string to float: '3.975.808'.

Is it possible to define that this column contains values with 5 decimals?

Solution

For a file table.txt that looks like

Time;B
397.318;abc
3.975.808;efg

you could try

df = pd.read_table("table.txt", sep=";")
df["Time"] = (
    df["Time"].str.replace(".", "", regex=False).astype("float").div(100_000)
)

to get

       Time    B
0   3.97318  abc
1  39.75808  efg

You could also use a converter

def to_float(string):
    return float(string.replace(".", "")) / 100_000

df = pd.read_table("table.txt", sep=";", converters={"Time": to_float})

but I think the first solution is more efficient.