pythonsql-serverpandassqlalchemy

Reading a table from SQL Server into a DataFrame using SQLAlchemy


I have a table named "products" on SQL Server. I would like to read the table into a DataFrame in Python using SQLAlchemy.

The pandas.read_sql function has a "sql" parameter that accepts two types of SQLAlchemy "selectable" objects:

I succeeded in using the text object but failed in using the select object. What corrections are needed?

successful code using text object:

import pandas as pd

from sqlalchemy import text
from sqlalchemy.engine import URL
from sqlalchemy import create_engine


url_object = URL.create(
    "mssql+pyodbc",
    host="abgsql.xx.xx.ac.uk",
    database="ABG",
    query={
        "driver": "ODBC Driver 18 for SQL Server",
        "TrustServerCertificate": "yes",
    },
)
engine = create_engine(url_object)
stmt = text("SELECT * FROM products")
df = pd.read_sql(sql=stmt, con=engine)

not successful code using select object:

import pandas as pd

from sqlalchemy import select
from sqlalchemy.engine import URL
from sqlalchemy import create_engine

from sqlalchemy import Table, MetaData


url_object = URL.create(
    "mssql+pyodbc",
    host="abgsql.xx.xx.ac.uk",
    database="ABG",
    query={
        "driver": "ODBC Driver 18 for SQL Server",
        "TrustServerCertificate": "yes",
    },
)
engine = create_engine(url_object)
products = Table("products", MetaData())
stmt = select(products)
df = pd.read_sql(sql=stmt, con=engine)
sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Incorrect syntax near the keyword 'FROM'. (156) (SQLExecDirectW)")
[SQL: SELECT
FROM products]
(Background on this error at: https://sqlalche.me/e/20/f405)

Solution

  • As @AlwaysLearning pointed out, the option to load some or all table information from an existing database is called reflection (https://docs.sqlalchemy.org/en/20/tutorial/metadata.html#table-reflection).

    from sqlalchemy import select, Table, MetaData, create_engine, URL
    import pandas as pd
    
    metadata_obj = MetaData()
    url_object = URL.create(
        "mssql+pyodbc",
        host="abgsql.xx.xx.ac.uk",
        database="ABG",
        query={
            "driver": "ODBC Driver 18 for SQL Server",
            "TrustServerCertificate": "yes",
        },
    )
    engine = create_engine(url_object)
    products = Table("products", metadata_obj, autoload_with=engine)
    stmt = select(products)
    df = pd.read_sql(sql=stmt, con=engine)