pythonexcelnumpy

Load Excel file into numpy 2D array


Is there an easier way to load an excel file directly into a Numpy array?

I have looked at the numpy.genfromtxt autoloading function from numpy documentation but it doesn't load excel files directly.

array = np.genfromtxt("Stats.xlsx")
ValueError: Some errors were detected !
Line #3 (got 2 columns instead of 1)
Line #5 (got 5 columns instead of 1)
......

Right now I am using using openpyxl.reader.excel to read the excel file and then append to numpy 2D arrays. This seems to be inefficient. Ideally I would like to have to excel file directly loaded to numpy 2D array.


Solution

  • Honestly, if you're working with heterogeneous data (as spreadsheets are likely to contain) using a pandas.DataFrame is a better choice than using numpy directly.

    While pandas is in some sense just a wrapper around numpy, it handles heterogeneous data very very nicely. (As well as a ton of other things... For "spreadsheet-like" data, it's the gold standard in the python world.)

    If you decide to go that route, just use pandas.read_excel.