sqlpyspark

How can we loop on a list of columns to apply a pyspark SQL query on each of them


This is the code I used in order to loop on the list of columns:

enter image description here

But I get the following error:

enter image description here

I did the same on another query but I get a syntax error too:

enter image description here

The code looks fine so I can't tell where the problem is


Solution

  • You cannot use SQL directly in select, instead do this,

    df.createOrReplaceTempView('DATA')
    
    col_list = [<enter column names here>]
    
    for i in col_list:
        sql_query = spark.sql(f"select * from data where {i} is null")
        sql_query.show()
    

    or you can also do this,

    for i in col_list:
        df.filter(F.col(i).isNull()).show()