pythonapache-sparkpysparkpalantir-foundryspark-window-function

Sum() Window Function in PySpark without defining window spec


I am trying to add a new column "grand total" to my table on each row.

E.G:

first_name Order_id price
John 1 2.5
Ali 2 2
Abdul 3 3.5

What I want is:

first_name Order_id price grand_total
John 1 2.5 8
Ali 2 2 8
Abdul 3 3.5 8

My code:

new_df = new_df.withColumn("grand_total",F.sum(F.col("price")).over())

The error I receive is :

** TypeError: over() missing 1 required positional argument: 'window'" **

I am confused because, I am coming from SQL background, and SUM(column_name) over () is possible without the need to define a window inside over ().


Solution

  • try this:

    from pyspark.sql import Window
    new_df = new_df.withColumn("grand_total",F.sum(F.col("price")).over(Window.partitionBy()))