I am trying to get the result from date_add function in pyspark, when I use the function it always returns as column type. To see the actual result we have to add the result to a column to a dataframe but I want the result to be stored in variable. How can I store the resulted date?
df = spark.createDataFrame([('2015-04-08',)], ['dt'])
r = date_add(df.dt, 1)
print(r)
output:- Column<'date_add(dt, 1)'>
But I want output like below
output:- date.time(2015,04,09)
or
'2015-04-09'
date_add
has to be used within a withColumn
. In case you want the desired output, consider a non-spark approach using datetime
and timedelta
.
Alternately, if your use case requires spark, use the collect method like so
r=df.withColumn(‘new_col’, date_add(col(‘dt’), 1)).select(‘new_col’).collect()