[SOLVED] numpy.loadtxt: load a range of columns

numpy.loadtxt: load a range of columns

I have a .csv file with both string and integer - containing columns. I need to use numpy.loadtxt method to import the matrix formed from the specified columns. How can I do that? Right now I am trying to do the following:

data = np.loadtxt(open(path_to_data, "rb"), delimiter=",", skiprows=1, usecols=[1:])

Basically trying to read all columns but first, but it is giving an error:

SyntaxError: invalid syntax

Because such syntax is not allowed: usecols=[1:]

Solution

This is the syntax error:

In [153]: [1:]
  File "<ipython-input-153-4bac19319341>", line 1
    [1:]
      ^
SyntaxError: invalid syntax

It's not specific to loadtxt.

Use

data = np.loadtxt(open(path_to_data, "rb"), delimiter=",", skiprows=1, usecols=np.arange(1,n))

where n is the total number of columns.

usecols : int or sequence, optional
    Which columns to read, with 0 being the first. For example,
    ``usecols = (1,4,5)`` will extract the 2nd, 5th and 6th columns.
    The default, None, results in all columns being read.

If you don't know n, and don't want to use a preliminary file read to determine it, genfromtxt might be easier.

data = np.genfromtxt(..., delimiter=',', skiprows=1)

should load all columns, putting nan where it can't convert the string into float. If those nan are all in the first column, then

data = data[:,1:]

should give you all but the first column.

genfromtxt is a little more forgiving when it comes to converting strings to floats.