What is the conversion syntax to convert a successfully loaded xlrd excel sheet to a numpy matrix (that represents that sheet)?
Right now I'm trying to take each row of the spreadsheet and add it to the numpy matrix. I can't figure out the syntax for converting a Sheet.row into a numpy.ndarray. Here's what I've tried so far:
import xlrd
workbook = xlrd.open_workbook('input.xlsx')
worksheet = workbook.sheet_by_name('Sheet1')
num_rows = worksheet.nrows - 1
num_cells = worksheet.ncols - 1
inputData = numpy.empty([worksheet.nrows - 1, worksheet.ncols])
curr_row = -1
while curr_row < num_rows: # for each row
curr_row += 1
row = worksheet.row(curr_row)
if curr_row > 0: # don't want the first row because those are labels
inputData[curr_row - 1] = numpy.array(row)
I've tried all sorts of things on that last line to try to convert the row to something numpy will accept and add to the inputData matrix. What is the correct conversion syntax?
You are trying to convert an object row
, which is a list of xlrd.sheet.Cell
elements to a numpy array straight away. That won't work the way you want it to. You'll have to do this the long way and go over each of the columns too:
while curr_row < num_rows: # for each row
curr_row += 1
row = worksheet.row(curr_row)
if curr_row > 0: # don't want the first row because those are labels
for col_ind, el in enumerate(row):
inputData[curr_row - 1, col_ind] = el.value
There seems to exist a function for this in pandas though, as suggested elsewhere on SO. And pandas dataframes inherit from numpy arrays, so can be transformed to them too. Probably best not to reinvent the wheel...