apache-sparkpysparkapache-spark-sql

Pyspark : Dynamically prepare pyspark-sql query using parameters


What are the different ways to dynamicaly bind parameters and prepare pyspark-sql statament.

Example:

Dynamic Query

query = '''SELECT column1, column2
           FROM ${db_name}.${table_name}
           WHERE column1 = ${filter_value}'''

Above dynamic query have ${db_name}, ${table_name} and ${filter_value} variables, These variables will get values from run time parameters.

Parameter Details:

db_name = 'your_db_name'
table_name = 'your_table_name'
filter_value = 'some_value'

Expected Query after Binding Parameters in Dynamic Query

SELECT column1, column2
FROM your_db_name.your_table_name
WHERE column1 = some_value  

Solution

  • Here are few options to prepare pyspark-sql through binding parameter.

    Option#1 - Using String Interpolation / f-Strings (Python 3.6+)

    db_name = 'your_db_name'
    table_name = 'your_table_name'
    filter_value = 'some_value'
    
    query = f'''SELECT column1, column2
               FROM {db_name}.{table_name}
               WHERE column1 = {filter_value}'''
    

    Option#2 - Using String Formatting (str.format)

    query = '''SELECT column1, column2
               FROM {}.{}
               WHERE column1 = {}'''
    
    db_name = 'your_db_name'
    table_name = 'your_table_name'
    filter_value = 'some_value'
    
    query.format(db_name, table_name, filter_value)
    

    Option#3 - Using Template String

    query = '''SELECT column1, column2
               FROM ${db_name}.${table_name}
               WHERE column1 = ${filter_value}'''
    
    db_name = 'your_db_name'
    table_name = 'your_table_name'
    filter_value = 'some_value'
    
    from string import Template
    t = Template(query)
    t.substitute(db_name=db_name, table_name=table_name, filter_value=filter_value)