pythonjupyter-notebook

Python variables in Jupyter SQL cells


I have a dataframe that needs to be joined with the result set from a query. The query uses a column from a dataframe to filter the data in the database.

data_list = list(df['needed_column'])

I would like to use the variable in an sql query executed in a Jupyter sql cell.

%%sql
SELECT
    column_1,
    column_2,
    column_3
FROM my_database.my_table
WHERE
    column_1 IN data_list

Is there anyway that this can be done?


Solution

  • An workaround would be to execute the query inline as a variable.

    data_list = str(list(df['needed_column']).replace('[', '(').replace(']', ')')
    
    query_string = f"""
    SELECT
        column_1,
        column_2,
        column_3
    FROM my_database.my_table
    WHERE
        column_1 IN {data_list}
    """
    
    result_set = %sql $query_string