sqldatabricksdatabricks-sql

AnalysisException: Multi-column In predicates are not supported in the DELETE condition


I am doing the following in my databricks code. The code is trying to delete some rows from my table under conditions which seam simple to me.

from datetime import datetime, timedelta

today = datetime.today()
yesterday = today - timedelta(days = int(look_back_days))
yesterday_str = yesterday.strftime('%Y/%m/%d')

query = f"""DELETE FROM table_name a WHERE  AWB_ID  IN (SELECT * FROM a WHERE a.DW_CRT_RUN_DT >= '{yesterday_str}' or a.dw_upd_run_dt >= '{yesterday_str}' and a.awb_Stus_cd = 'VOID' and a.dw_upd_pgm_id = 'DSPD0580')"""

spark.sql(query)

However, I got the following error.

AnalysisException: Multi-column In predicates are not supported in the DELETE condition

I also put what was suggested by @kombajn-zbożowy below, but it is still throwing the same error

enter image description here


Solution

  • My own answer after couple years of activities in databricks

    Spark SQL does not support multi-column subqueries (e.g., SELECT *) in IN clauses for DELETE operations. You need to rewrite your query to avoid using SELECT * and restructure it using a join or explicitly specify columns

    possible workaround is

    query = f"""
    DELETE FROM table_name a 
    WHERE EXISTS (
        SELECT 1 
        FROM table_name b
        WHERE b.AWB_ID = a.AWB_ID
          AND (b.DW_CRT_RUN_DT >= '{yesterday_str}' 
               OR b.dw_upd_run_dt >= '{yesterday_str}')
          AND b.awb_Stus_cd = 'VOID' 
          AND b.dw_upd_pgm_id = 'DSPD0580'
    )
    """
    spark.sql(query)