apache-sparkdate-format

Spark: cast to DateType value with only month and year


Given date-value with month and year:

03.2020

I tried to cast it to DateType as following:

to_timestamp(col("Date"), "MM.yyyy").cast(DateType)

But this return something, that I did't expect:

 2017-03-01

Returned value contains additional "01" (day value), and reorder month and year. So the expected value is "03.2020"

Following this Advice , I also tried:

 date_format(col("Date"), "MM.yyyy")

But in this case the function returns null.

What am I doing wrong ?


Solution

  • This should work:

    dataframe
    .withColumn("as_date",date_format((to_date('string_date', 'MM.yyyy')).cast('timestamp'),"MM.yyyy"))
    .show()
    

    Essentially you need to pass timestamp to date_format instead of string column

    Input:

    +-----------+
    |string_date|
    +-----------+
    |    03.2020|
    +-----------+
    

    Output:

    +-----------+-------+
    |string_date|as_date|
    +-----------+-------+
    |    03.2020|03.2020|
    +-----------+-------+