pythonpython-3.xnumpyfilegenfromtxt

How can I import multiple columns of a file to the same array in Python?


I have a .txt file of the following form:

header line 1
header line 2
x1  y1  x4  y4  x7  y7
x2  y2  x5  y5  x8  y8
x3  y3  x6  y6  x9  y9
footer line

the x and y values would be separated by a tab and in my case be numbers of the form "2,9 " (including the last space). Example:

header line 1
header line 2
1,0     1,5     4,0     4,5     7,0     7,5 
2,0     2,5     5,0     5,5     8,0     8,5 
3,0     3,5     6,0     6,5     9,0     9,5 
footer line

The file is encoded in latin-1. I'm looking for an easy way to get numpy arrays of my x and y converted to float, that is:

array([1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0])

and similarly for y.

I've first created a function to replace "," by "." and remove trailing spaces:

import numpy as np

def ctf(valstr):
    return float(valstr.replace(',','.').replace(" ",""))

then defined a dictionary of variable length to use later:

def dic(length):
    dic={}
    for i in range(0,length):
        dic[i]=ctf
    return dic

I could then "manually" read in the columns and join them together:

xval1,yval1,xval2,yval2,xval3,yval4=np.genfromtxt("file.txt",delimiter="",unpack=True,skip_header=2,skip_footer=1,encoding="latin-1",converters=dic(6))

xvalues=np.concatenate((xval1,xval2,xval3))
yvalues=np.concatenate((yval1,yval2,yval3))

It works but isn't exactly pretty, especially if I have even more columns. What I would like is a method to only having to specify the total number of columns (in the case above 6) and the number of arrays I want to get (in my example 2).

Note: I don't think the converter/dictionary part is actually relevant for my problem. I included it because I need any alternative solution to be able to use converters or achieve the same result in some other way.


Solution

  • You can use Pandas to import only the relevant lines from the file, then flatten the columns into numpy arrays:

    import pandas as pd
    
    df = pd.read_csv("file.txt", header=None, skiprows=2, skipfooter=1, sep=r"\s+")
    
    df = df.replace(",", ".", regex=True).astype(float)
    
    n = 2
    
    arrays = [df.iloc[:, i::n].to_numpy().flatten(order="F") for i in range(n)]
    
    [array([1., 2., 3., 4., 5., 6., 7., 8., 9.]), array([1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5])]