apache-sparkpysparkapache-spark-sql

How to drop rows with nulls in one column pyspark


I have a dataframe and I would like to drop all rows with NULL value in one of the columns (string). I can easily get the count of that:

df.filter(df.col_X.isNull()).count()

I have tried dropping it using following command. It executes but the count still returns as positive

df.filter(df.col_X.isNull()).drop()

I tried different attempts but it returns 'object is not callable' error.


Solution

  • Dataframes are immutable so you have to redefine df. Just applying a filter that removes null values will create a new dataframe which wouldn't have the records with null values.

    df = df.filter(df.col_X.isNotNull())