pysparkapache-spark-sqlexpr

pyspark add int column to a fixed date


I have a fixed date "2000/01/01" and a dataframe:

data1 = [{'index':1,'offset':50}]
data_p = sc.parallelize(data1)
df = spark.createDataFrame(data_p)

I want to create a new column by adding the offset column to this fixed date

I tried different method but cannot pass the column iterator and expr error as: function is neither a registered temporary function nor a permanent function registered in the database 'default'

The only solution I can think of is

df = df.withColumn("zero",lit(datetime.strptime('2000/01/01', '%Y/%m/%d')))
df.withColumn("date_offset",expr("date_add(zero,offset)")).drop("zero")

Since I cannot use lit and datetime.strptime in the expr, I have to use this approach which creates a redundant column and redundant operations.

Any better way to do it?


Solution

  • As you have marked it as pyspark question so in python you can do below

    df_a3.withColumn("date_offset",F.lit("2000-01-01").cast("date") + F.col("offset").cast("int")).show()
    

    Edit- As per comment below lets assume there was an extra column of type then based on it below code can be used

    df_a3.withColumn("date_offset",F.expr("case when type ='month' then add_months(cast('2000-01-01' as date),offset) else date_add(cast('2000-01-01' as date),cast(offset as int)) end ")).show()