I am having a date column in my dataframe
I wanted to filter out the last 14 days from the dataframe using the date column. I tried the below code but it's not working
last_14 = df.filter((df('Date')> date_add(current_timestamp(), -14)).select("Event_Time","User_ID","Impressions","Clicks","URL", "Date")
Event_time, user_id, impressions, clicks, URL is my other columns
Can anyone advise how to do this?
from pyspark.sql import functions as F, types as T
df = spark.createDataFrame(
[
('2022-03-10',),
('2022-03-09',),
('2022-03-08',),
('2022-02-02',),
('2022-02-01',)
], ['Date']
).withColumn('Date', F.to_date('Date', 'y-M-d'))
df\
.filter((F.col('Date') > F.date_sub(F.current_date(), 14)))\
.show()
+----------+
| Date|
+----------+
|2022-03-10|
|2022-03-09|
|2022-03-08|
+----------+
In your code it would be:
last_14 = df.filter((F.col('Date') > F.date_sub(F.current_date(), 14))).select("Event_Time","User_ID","Impressions","Clicks","URL", "Date")