pysparkhivecontext

Filter out null strings and empty strings in hivecontext.sql


I'm using pyspark and hivecontext.sql and I want to filter out all null and empty values from my data.

So I used simple sql commands to first filter out the null values, but it doesen't work.

My code:

hiveContext.sql("select column1 from table where column2 is not null")

but it work without the expression "where column2 is not null"

Error:

Py4JavaError: An error occurred while calling o577.showString

I think it was due to my select is wrong.

Data example:

column 1 | column 2
null     |   1
null     |   2
1        |   3
2        |   4
null     |   2
3        |   8

Objective:

column 1 | column 2
1        |   3
2        |   4
3        |   8

Tks


Solution

  • It work for me:

    df.na.drop(subset=["column1"])