pythonarraysnumpydata-files

Converting .data file into numpy arrays


My file.data looks like this:

   "3.0,1.5,0\n
     4.6,0.7,1\n
     5.8,2.7,2"

And I want to load this data into two numpy arrays so that it looks like this in the end:

X = [ [3.0, 1.5],
      [4.6, 0.7],
      [5.8, 2.7] ]

y = [0, 1, 2]

If I do the following...

fname = open("file.data", "r")
for line in fname.readlines():
    print(line)

...I can read line by line as strings, but what would be the best way to separate these values and put them into the two numpy arrays as shown above?

Is there a nice module or function in numpy that does this really efficiently?


Solution

    1. If your data file is a simple txt file with a delimiter as you shown, then you can use numpy.loadtxt to load entire data once
    import numpy as np
    data = np.loadtxt("file.data",delimiter=',')
    X = data[:,0:2]
    Y = data[:,2]
    
    1. Incase you want to read line by line, you can try using numpy.fromstring which will output each string into an array
    import numpy as np
    data =[]
    fname = open("file.data", "r")
    for line in fname.readlines():
        data.append(fromstring(line,sep=','))
    data_array = np.array(data)
    X = data_array[:,0:2]
    Y = data_array[:,2]