apache-sparkapache-spark-dataset

How to check if one column in spark Dataset is empty?


I have a Dataset<Row> df, and its content are shown below:

Column A Column B Column C
121.2 A
23.1 B

And I want to check if Column B is empty.

I think I can use this to check whether it is empty:

Dataset<Row> checkDf = df.select("Column B");
Boolean isEmpty = checkDf.isEmpty();
 

But I want to know if there is a more efficient way.

Thank you.


Solution

  • Dataset can be filtered for not null values, and emptiness checked, on Scala:

    val isColumnBEmpty = checkDf.where(col("Column B").isNotNull).isEmpty