pythonsqllistapache-sparkpyspark

Adding elements from a list to spark.sql() statement


My list:

pylist = ['A', 'B', 'C', 'D']

My spark.sql() line:

query = spark.sql(
    """
    SELECT col1, col2, col3
    FROM database.table
    WHERE col3 IN ('A', 'B', 'C', 'D')
    """
)

I want to replace the list of elements in the spark.sql() statement with the Python list so that the last line in the SQL is:

...
AND col3 IN pylist

I am aware of {} and str.format but I don't understand if that's the correct option and how that works.


Solution

  • I think the solution is .format(tuple(pylist)):

    pylist = ['A', 'B', 'C', 'D']
    
    s = """
        SELECT col1, col2, col3
        FROM database.table
        WHERE col3 IN {}
        """.format(tuple(pylist))
    
    query = spark.sql(s)