pythonpandasseparatoreuropean-data-formatread-csv

How to get the custom separator Excel uses in pd.read_csv()


I have several .txt files with data I would like to plot. Since there are so many, I prefer using python to plot these rather than Excel. The .txt files have been generated by German machinery, which is the reason for a decimal comma. This can be solved easily enough with the decimal parameter set to ','. It however seem as if the separator used in the .txt file is not a normal tab or space, but some custom separator. I cannot manage to separate the columns in python. Find attached my code. Note that I have tried the normal separators in python, including leaving sep blank so that it can look for its own separator, but to no avail.

If I however use Excel, I can import the .txt file when selecting the ,,custom'' separator. I cannot figure out which separator this is, but would like to duplicate it in python (pandas.read_csv()). Does anyone know how to figure out which separator Excel used, or how to achieve the same result in python? A link to an example of the .txt file can be found here.

Code:

gdh_pms = []
os.chdir(r"C:\Users\pivde\Desktop\Tuks\nagraads\karakterisering\tga\txt")
for leggernaam in glob.glob("*.txt"):
    with open(leggernaam, mode='rb') as fp: data = fp.read()  
    stroom = io.StringIO(data.decode('ansi')) 
    df = pd.read_csv(stroom, skiprows=10, skipfooter=8, engine='python', sep='\t',  decimal=',')
    df['origin'] = leggernaam
    gdh_pms.append(df)
os.chdir(r"C:\Users\pivde\Desktop\Tuks\nagraads\karakterisering\tga")

Result:

    8 31,333 4,63395    origin
0   9 31,500 4,6333 PLA 0_gdh_30_200°C_Air_090823_DSC.txt
1   10 31,667 4,63265   PLA 0_gdh_30_200°C_Air_090823_DSC.txt
2   11 31,833 4,63207   PLA 0_gdh_30_200°C_Air_090823_DSC.txt
3   12 32,000 4,63107   PLA 0_gdh_30_200°C_Air_090823_DSC.txt
4   13 32,167 4,63013   PLA 0_gdh_30_200°C_Air_090823_DSC.txt
... ... ...
998 1007 197,833 4,44944    PLA 0_gdh_30_200°C_Air_090823_DSC.txt
999 1008 198,000 4,44985    PLA 0_gdh_30_200°C_Air_090823_DSC.txt
1000    1009 198,167 4,45055    PLA 0_gdh_30_200°C_Air_090823_DSC.txt
1001    1010 198,333 4,45132    PLA 0_gdh_30_200°C_Air_090823_DSC.txt
1002    1011 198,500 4,45206    PLA 0_gdh_30_200°C_Air_090823_DSC.txt
1003 rows × 2 columns

Solution

  • It seems that the separator consists in a variable number of whitespaces so you can try with:

    df = pd.read_csv(stroom, skiprows=10, skipfooter=8, engine='python', decimal=',', delim_whitespace=True)