pythonnumpy

numpy.loadtxt: load a range of columns


I have a .csv file with both string and integer - containing columns. I need to use numpy.loadtxt method to import the matrix formed from the specified columns. How can I do that? Right now I am trying to do the following:

data = np.loadtxt(open(path_to_data, "rb"), delimiter=",", skiprows=1, usecols=[1:]) 

Basically trying to read all columns but first, but it is giving an error:

SyntaxError: invalid syntax

Because such syntax is not allowed: usecols=[1:]


Solution

  • This is the syntax error:

    In [153]: [1:]
      File "<ipython-input-153-4bac19319341>", line 1
        [1:]
          ^
    SyntaxError: invalid syntax
    

    It's not specific to loadtxt.

    Use

    data = np.loadtxt(open(path_to_data, "rb"), delimiter=",", skiprows=1, usecols=np.arange(1,n))
    

    where n is the total number of columns.

    usecols : int or sequence, optional
        Which columns to read, with 0 being the first. For example,
        ``usecols = (1,4,5)`` will extract the 2nd, 5th and 6th columns.
        The default, None, results in all columns being read.
    

    If you don't know n, and don't want to use a preliminary file read to determine it, genfromtxt might be easier.

    data = np.genfromtxt(..., delimiter=',', skiprows=1)
    

    should load all columns, putting nan where it can't convert the string into float. If those nan are all in the first column, then

    data = data[:,1:]
    

    should give you all but the first column.

    genfromtxt is a little more forgiving when it comes to converting strings to floats.