data-analysisduckdb

Read excel file with Sheets using duckdb


I have excel file which has three sheets, using duckdb how to read all sheets into one dataframe?

Normally i'm using spatial extension to read excel files with one sheet and it works perfect, here my code for reading excel.

import duckdb

import polars as pl

# Create a connection to DuckDB

conn = duckdb.connect()

# Install and load the spatial extension

conn.execute("INSTALL spatial;")

conn.execute("LOAD spatial;")

result = conn.execute("""

SELECT * FROM st_read('AccountNumber.xlsx',open_options = ['HEADERS=FORCE']);

""").pl()

result

Solution

  • If you know sheet names you can pass it as layer parameter and use union all to combine the sheets (I've added sheet name to result for clarity):

    stmt = [f"""
    select '{x}' as Sheet, *
    from st_read(
        'AccountNumber.xlsx',
        open_options = ['HEADERS=FORCE'],
        layer='{x}'
    )""" for x in ["Sheet1","Sheet2"]]
    
    stmt = " union all ".join(stmt)
    
    duckdb.sql(stmt).pl()
    

    As you use it as a Polars dataframe you can also read it with pl.read_excel():

    pl.concat([
        d.with_columns(pl.lit(n).alias("Sheet"))
        for n, d in pl.read_excel('AccountNumber.xlsx', sheet_id=0).items()
    ])