I have a table like as shown below since the order numbers reoccur based on a date i would like to read just one of them with the latest date. example is just get A1 for 24/03/2022 on pyspark thanks
w = Window.partitionBy('order').orderBy('date')
df = (df
.withColumn('rank',F.row_number().over(w)))
df = (df
.filter(df['rank'] == 1).drop('rank'))
I solved this by ranking the Orders by date and selecting the one with the lowest rank 1