When trying to create boolean column that is True if two other column are equal and False otherwise, I noticed that Null == Null = False in spark.
df.withColumn('newCol', F.when(F.col('x')==F.col('y'), True).otherwise(False))
This: https://spark.apache.org/docs/3.0.0-preview/sql-ref-null-semantics.html Suggests that I could use <=> if I was using SQL syntax, but am looking to stick to pyspark sql api if I can.
df.withColumn('newCol', F.when(F.col('x')<=>F.col('y'), True).otherwise(False))
doesn't seem to work.
Does anyone have any suggestions?
Check out pyspark.sql.Column.eqNullSafe
: https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.Column.eqNullSafe.html