I have a dataframe that needs to be joined with the result set from a query. The query uses a column from a dataframe to filter the data in the database.
data_list = list(df['needed_column'])
I would like to use the variable in an sql query executed in a Jupyter sql cell.
%%sql
SELECT
column_1,
column_2,
column_3
FROM my_database.my_table
WHERE
column_1 IN data_list
Is there anyway that this can be done?
An workaround would be to execute the query inline as a variable.
data_list = str(list(df['needed_column']).replace('[', '(').replace(']', ')')
query_string = f"""
SELECT
column_1,
column_2,
column_3
FROM my_database.my_table
WHERE
column_1 IN {data_list}
"""
result_set = %sql $query_string