I have several .txt files with data I would like to plot. Since there are so many, I prefer using python to plot these rather than Excel. The .txt files have been generated by German machinery, which is the reason for a decimal comma. This can be solved easily enough with the decimal parameter set to ','. It however seem as if the separator used in the .txt file is not a normal tab or space, but some custom separator. I cannot manage to separate the columns in python. Find attached my code. Note that I have tried the normal separators in python, including leaving sep blank so that it can look for its own separator, but to no avail.
If I however use Excel, I can import the .txt file when selecting the ,,custom'' separator. I cannot figure out which separator this is, but would like to duplicate it in python (pandas.read_csv()). Does anyone know how to figure out which separator Excel used, or how to achieve the same result in python? A link to an example of the .txt file can be found here.
Code:
gdh_pms = []
os.chdir(r"C:\Users\pivde\Desktop\Tuks\nagraads\karakterisering\tga\txt")
for leggernaam in glob.glob("*.txt"):
with open(leggernaam, mode='rb') as fp: data = fp.read()
stroom = io.StringIO(data.decode('ansi'))
df = pd.read_csv(stroom, skiprows=10, skipfooter=8, engine='python', sep='\t', decimal=',')
df['origin'] = leggernaam
gdh_pms.append(df)
os.chdir(r"C:\Users\pivde\Desktop\Tuks\nagraads\karakterisering\tga")
Result:
8 31,333 4,63395 origin
0 9 31,500 4,6333 PLA 0_gdh_30_200°C_Air_090823_DSC.txt
1 10 31,667 4,63265 PLA 0_gdh_30_200°C_Air_090823_DSC.txt
2 11 31,833 4,63207 PLA 0_gdh_30_200°C_Air_090823_DSC.txt
3 12 32,000 4,63107 PLA 0_gdh_30_200°C_Air_090823_DSC.txt
4 13 32,167 4,63013 PLA 0_gdh_30_200°C_Air_090823_DSC.txt
... ... ...
998 1007 197,833 4,44944 PLA 0_gdh_30_200°C_Air_090823_DSC.txt
999 1008 198,000 4,44985 PLA 0_gdh_30_200°C_Air_090823_DSC.txt
1000 1009 198,167 4,45055 PLA 0_gdh_30_200°C_Air_090823_DSC.txt
1001 1010 198,333 4,45132 PLA 0_gdh_30_200°C_Air_090823_DSC.txt
1002 1011 198,500 4,45206 PLA 0_gdh_30_200°C_Air_090823_DSC.txt
1003 rows × 2 columns
It seems that the separator consists in a variable number of whitespaces so you can try with:
df = pd.read_csv(stroom, skiprows=10, skipfooter=8, engine='python', decimal=',', delim_whitespace=True)