pythonexcelpandasxlrd

How to obtain sheet names from XLS files without loading the whole file?


I'm currently using pandas to read an Excel file and present its sheet names to the user, so he can select which sheet he would like to use. The problem is that the files are really big (70 columns x 65k rows), taking up to 14s to load on a notebook (the same data in a CSV file is taking 3s).

My code in panda goes like this:

xls = pandas.ExcelFile(path)
sheets = xls.sheet_names

I tried xlrd before, but obtained similar results. This was my code with xlrd:

xls = xlrd.open_workbook(path)
sheets = xls.sheet_names

So, can anybody suggest a faster way to retrieve the sheet names from an Excel file than reading the whole file?


Solution

  • you can use the xlrd library and open the workbook with the "on_demand=True" flag, so that the sheets won't be loaded automaticaly.

    Than you can retrieve the sheet names in a similar way to pandas:

    import xlrd
    xls = xlrd.open_workbook(r'<path_to_your_excel_file>', on_demand=True)
    print xls.sheet_names() # <- remeber: xlrd sheet_names is a function, not a property