dataframejoinpyspark

Create new pyspark data frame from two joined data frames


I want to save the result of two joined dataframes in pyspak in a third, new dataframe.

When I assigned the joined dfs to a variable, its class is NoneType, and what I need is a dataframe.

mappedhh = spark.sql("SELECT * FROM Notebooks_Lakehouse.properties_coordinates")

temp_test = spark.sql("SELECT * FROM Notebooks_Lakehouse.vw_temp")

new_df = temp_test.alias("a").join(mappedhh.alias("b"), ["Reference"], "left").select("a.Reference", "a.FullAddress", "b.AddressCoordinates").show()

Once I have the new_df, I want to replace the null's with 0 :

new_df = new_df.fillna({"AddressCoordinates" : 0})

this returns an error message: 'NoneType' object has no attribute 'fillna'


Solution

  • You need to remove .show when creating new_df because this prints the DataFrame, but returns None (as seen here)

    The following should work:

    new_df = temp_test.alias("a").join(mappedhh.alias("b"), ["Reference"], "left").select("a.Reference", "a.FullAddress", "b.AddressCoordinates")
    new_df.show()
    
    new_df = new_df.fillna({"AddressCoordinates" : 0})