csv to dataframe with several delimiters and software bug

I would like to read in a csv-file. Sadly this is created by a buggy software: My header row contains multiple separators ";" and ",". Which wouldn't be a problem if my dataset wouldn't have a decimal ",".

I would like to automatically read in the files without changing the header by hand but I did not find something according to in the Pandas documentation.

The example data:

some file header row number 1
some file header row number two
some more information about data in that file
column1;column2;column3,column4,column5
1,234;2,665;0,888;3,891;3,762
2,232;1,233;0,888;3,789;3,524

Can I specify a different separator for the header than for the dataset itself? My code:

df = pd.read_csv(
    "mypath\test.csv", 
    skiprows = 3;header= 0, delimiter=';,', decimal = ",")

which results in reading everything into one column

    column1;column2;column3,column4,column5
0   1,234;2,665;0,888;3,891;3,762
1   2,232;1,233;0,888;3,789;3,524

But the result should be five columns:

    column1     column2     column3     column4     column5
0   1.234   2.665   0.888   3.891   3.762
1   2.232   1.233   0.888   3.789   3.524

Solution

Assuming your headers start with a letter from [A-Za-z] you can pass a regex with a positive lookahead for the comma-option to sep inside pd.read_csv:

import pandas as pd
from io import StringIO

csv = """some file header row number 1
some file header row number two
some more information about data in that file
column1;column2;column3,column4,column5
1,234;2,665;0,888;3,891;3,762
2,232;1,233;0,888;3,789;3,524"""

df = pd.read_csv(StringIO(csv), 
                 sep=r';|,(?=[A-Za-z])', 
                 skiprows=3, 
                 engine='python', 
                 decimal=',')

Output:

   column1  column2  column3  column4  column5
0    1.234    2.665    0.888    3.891    3.762
1    2.232    1.233    0.888    3.789    3.524

Regex explanation. Or use: r';|,(?=[^\d])' (not followed by a digit).

You can of course adjust the regex if you need a more strict solution to ensure that the option for the comma won't match for the subsequent rows.

Edit: to clarify, the equivalent for your file path is:

pd.read_csv("mypath\test.csv", sep=r';|,(?=[A-Za-z])', ...)

# replacing `StringIO(csv)` with `"mypath\test.csv"`

Not:

pd.read_csv(StringIO("mypath\test.csv"), sep=r';|,(?=[A-Za-z])', ...)

which generates the error mentioned in the comment:

EmptyDataError: No columns to parse from file