I have a pySpark dataframe with a date column as yyyyddd
, where yyyy
is year(format 2020, 2021) and ddd
is the day of year(format 001, 365, 366).
I am trying to convert it to date as:
df = df.withColumn("new_date", to_date("old_date", "yyyyddd"))
but this gives me the correct answer for January dates only, and 'Null' for all other months.
old_date is StringType
and new_date is DateType
old_date | new_date |
---|---|
2006272 (means 272nd day of 2006) | null |
2008016 | 2008-01-16 |
2011179 | null |
2011026 | 2011-01-26 |
How can I convert this date format?
You can use D
format which represents the day of year
in unix_timestamp
functions like below. You would not need UDF
to perform this operation
# Import functions
import pyspark.sql.functions as f
df.withColumn("new_date", f.from_unixtime(f.unix_timestamp("old_date", 'yyyyD'),'yyyy-MMdd'))