I have a spark DataFrame which is grouped by a column aggregated with a count:
df.groupBy('a').agg(count("a")).show()
+---------+----------------+
|a |count(a) |
+---------+----------------+
| null| 0|
| -90| 45684|
+---------+----------------+
df.select('a').filter(col('a').isNull()).count()
returns
warning: there was one feature warning; re-run with -feature for details
res9: Long = 26834
which clearly shows that the null values were not counted initially.
What is the reason for this behaviour? I would have expected (if null
at all is contained in the grouping result) to properly see the counts.
Yes, count
applied to a specific column does not count the null values. If you want to include the null values, use:
df.groupBy('a').agg(count("*")).show()