csvencodingtime-seriesdelimiterfrench

Read a csv file in unknown encoding format


I am trying to read a CSV file in python(google colab). Please, find attached the file in the following link: https://github.com/LeGentilHomme/CSV-FILE/blob/4b69985482e59906b64a540b9c0d0a7fce31a37e/exportIndicateurs(7).csv In fact, this time series data in French contains semi-columns and the variables are contained in the rows, while the time or dates are in the columns. After multiple trials, what I get is a table with all the values and semi-columns in one column, and the rest of the cells shows NaN values. I have tried the following code:

df=pd.read_csv("exportIndicateurs(7).csv", delimiter=';',header=1, encoding='latin-1')
df

but I get the results that I have attached in the following pictures.

Thank you in advance for your help.Result picture


Solution

  • Use (-*;) as a regex separator in read_csv :

    gh_link = "https://raw.githubusercontent.com/LeGentilHomme/CSV-FILE/4b69985482e59906b64a540b9c0d0a7fce31a37e/exportIndicateurs(7).csv"
    
    df = pd.read_csv(gh_link, sep="-*;", header=2, engine="python")
    

    Output:

    enter image description here