pythonapache-sparkpyspark

Is there a null-safe comparison operator for pyspark?


When trying to create boolean column that is True if two other column are equal and False otherwise, I noticed that Null == Null = False in spark.

df.withColumn('newCol', F.when(F.col('x')==F.col('y'), True).otherwise(False))

This: https://spark.apache.org/docs/3.0.0-preview/sql-ref-null-semantics.html Suggests that I could use <=> if I was using SQL syntax, but am looking to stick to pyspark sql api if I can.

df.withColumn('newCol', F.when(F.col('x')<=>F.col('y'), True).otherwise(False))

doesn't seem to work.

Does anyone have any suggestions?


Solution

  • Check out pyspark.sql.Column.eqNullSafe: https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.Column.eqNullSafe.html