I am using Spark shell 1.6. I want to perform a check to separate all the rows containing null values from the once that don't. More precisely I have to segregate them into 2 different tables (data and error). Problem is that I have too many columns (42) so checking it one at a time goes out of the question. Some points which may help:
scala> val data = Seq(("1","one",null),(null,"2","null"),("3",null,null),(null,null,null),("4","four","IV")).toDF
scala> data.show
+----+----+----+
| _1| _2| _3|
+----+----+----+
| 1| one|null|
|null| 2|null|
| 3|null|null|
|null|null|null|
| 4|four| IV|
+----+----+----+
scala> val error = data.filter(x=>x.anyNull)
scala> error.show
+----+----+----+
| _1| _2| _3|
+----+----+----+
| 1| one|null|
|null| 2|null|
| 3|null|null|
|null|null|null|
+----+----+----+
scala> val correct_data =data.except(error)
scala> correct_data.show
+---+----+---+
| _1| _2| _3|
+---+----+---+
| 4|four| IV|
+---+----+---+