I am trying to add a new column "grand total" to my table on each row.
E.G:
first_name | Order_id | price |
---|---|---|
John | 1 | 2.5 |
Ali | 2 | 2 |
Abdul | 3 | 3.5 |
What I want is:
first_name | Order_id | price | grand_total |
---|---|---|---|
John | 1 | 2.5 | 8 |
Ali | 2 | 2 | 8 |
Abdul | 3 | 3.5 | 8 |
My code:
new_df = new_df.withColumn("grand_total",F.sum(F.col("price")).over())
The error I receive is :
** TypeError: over() missing 1 required positional argument: 'window'" **
I am confused because, I am coming from SQL background, and SUM(column_name) over () is possible without the need to define a window inside over ().
try this:
from pyspark.sql import Window
new_df = new_df.withColumn("grand_total",F.sum(F.col("price")).over(Window.partitionBy()))