I have a .txt file of the following form:
header line 1
header line 2
x1 y1 x4 y4 x7 y7
x2 y2 x5 y5 x8 y8
x3 y3 x6 y6 x9 y9
footer line
the x and y values would be separated by a tab and in my case be numbers of the form "2,9 " (including the last space). Example:
header line 1
header line 2
1,0 1,5 4,0 4,5 7,0 7,5
2,0 2,5 5,0 5,5 8,0 8,5
3,0 3,5 6,0 6,5 9,0 9,5
footer line
The file is encoded in latin-1
. I'm looking for an easy way to get numpy arrays of my x and y converted to float, that is:
and similarly for y.
I've first created a function to replace "," by "." and remove trailing spaces:
import numpy as np
def ctf(valstr):
return float(valstr.replace(',','.').replace(" ",""))
then defined a dictionary of variable length to use later:
def dic(length):
for i in range(0,length):
return dic
I could then "manually" read in the columns and join them together:
It works but isn't exactly pretty, especially if I have even more columns. What I would like is a method to only having to specify the total number of columns (in the case above 6) and the number of arrays I want to get (in my example 2).
Note: I don't think the converter/dictionary part is actually relevant for my problem. I included it because I need any alternative solution to be able to use converters or achieve the same result in some other way.
You can use Pandas to import only the relevant lines from the file, then flatten the columns into numpy arrays:
import pandas as pd
df = pd.read_csv("file.txt", header=None, skiprows=2, skipfooter=1, sep=r"\s+")
df = df.replace(",", ".", regex=True).astype(float)
n = 2
arrays = [df.iloc[:, i::n].to_numpy().flatten(order="F") for i in range(n)]
[array([1., 2., 3., 4., 5., 6., 7., 8., 9.]), array([1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5])]