pythonexcelpandas

Python Pandas dataframe reading exact specified range in an excel sheet


I have a lot of different table (and other unstructured data in an excel sheet) .. I need to create a dataframe out of range 'A3:D20' from 'Sheet2' of Excel sheet 'data'.

All examples that I come across drilldown up to sheet level, but not how to pick it from an exact range.

import openpyxl
import pandas as pd

wb = openpyxl.load_workbook('data.xlsx')
sheet = wb.get_sheet_by_name('Sheet2')
range = ['A3':'D20']   #<-- how to specify this?
spots = pd.DataFrame(sheet.range) #what should be the exact syntax for this?

print (spots)

Once I get this, I plan to look up data in column A and find its corresponding value in column B.

Edit 1: I realised that openpyxl takes too long, and so have changed that to pandas.read_excel('data.xlsx','Sheet2') instead, and it is much faster at that stage at least.

Edit 2: For the time being, I have put my data in just one sheet and:


Solution

  • One way to do this is to use the openpyxl module.

    Here's an example:

    from openpyxl import load_workbook
    
    wb = load_workbook(filename='data.xlsx', 
                       read_only=True)
    
    ws = wb['Sheet2']
    
    # Read the cell values into a list of lists
    data_rows = []
    for row in ws['A3':'D20']:
        data_cols = []
        for cell in row:
            data_cols.append(cell.value)
        data_rows.append(data_cols)
    
    # Transform into dataframe
    import pandas as pd
    df = pd.DataFrame(data_rows)