I want to save the result of two joined dataframes in pyspak in a third, new dataframe.
When I assigned the joined dfs to a variable, its class is NoneType, and what I need is a dataframe.
mappedhh = spark.sql("SELECT * FROM Notebooks_Lakehouse.properties_coordinates")
temp_test = spark.sql("SELECT * FROM Notebooks_Lakehouse.vw_temp")
new_df = temp_test.alias("a").join(mappedhh.alias("b"), ["Reference"], "left").select("a.Reference", "a.FullAddress", "b.AddressCoordinates").show()
Once I have the new_df, I want to replace the null's with 0 :
new_df = new_df.fillna({"AddressCoordinates" : 0})
this returns an error message: 'NoneType' object has no attribute 'fillna'
You need to remove .show
when creating new_df
because this prints the DataFrame, but returns None
(as seen here)
The following should work:
new_df = temp_test.alias("a").join(mappedhh.alias("b"), ["Reference"], "left").select("a.Reference", "a.FullAddress", "b.AddressCoordinates")
new_df.show()
new_df = new_df.fillna({"AddressCoordinates" : 0})