pythonsqlpandas

Executing an SQL query on a Pandas dataset


I have a Pandas dataset called df. How can I do:

df.query("select * from df")

Solution

  • This is not what pandas.query is supposed to do. You can look at package pandasql (same like sqldf in R )

    Update: Note pandasql hasn't been maintained since 2017. Use another library from an answer below.

    import pandas as pd
    import pandasql as ps
    
    df = pd.DataFrame([[1234, 'Customer A', '123 Street', np.nan],
                   [1234, 'Customer A', np.nan, '333 Street'],
                   [1233, 'Customer B', '444 Street', '333 Street'],
                  [1233, 'Customer B', '444 Street', '666 Street']], columns=
    ['ID', 'Customer', 'Billing Address', 'Shipping Address'])
    
    q1 = """SELECT ID FROM df """
    
    print(ps.sqldf(q1, locals()))
    
         ID
    0  1234
    1  1234
    2  1233
    3  1233
    

    Update 2020-07-10

    update the pandasql

    ps.sqldf("select * from df")