pyspark

Union list of pyspark dataframes


Let's say I have a list of pyspark dataframes: [df1, df2, ...], what I want is to union them (so actually do df1.union(df2).union(df3).... What's the best practice to achieve that?


Solution

  • you could use the reduce and pass the union function along with the list of dataframes.

    import pyspark
    from functools import reduce
    
    list_of_sdf = [df1, df2, ...]
    final_sdf = reduce(pyspark.sql.dataframe.DataFrame.unionByName, list_of_sdf)
    

    the final_sdf will have the appended data.