apache-sparkdate

How to cast to DateType value with only month and year


Given a date value with a month and a year:

03.2020

I tried to cast it to DateType as following:

to_timestamp(col("Date"), "MM.yyyy").cast(DateType)

But this returned something that I didn't expect:

 2017-03-01

The returned value contains an additional "01" (day value), and reordered the month and year. So the expected value is "03.2020"

Following this advice, I also tried:

 date_format(col("Date"), "MM.yyyy")

But in this case the function returns null.

What am I doing wrong?


Solution

  • This should work:

    dataframe
    .withColumn("as_date",date_format((to_date('string_date', 'MM.yyyy')).cast('timestamp'),"MM.yyyy"))
    .show()
    

    Essentially you need to pass timestamp to date_format instead of string column

    Input:

    +-----------+
    |string_date|
    +-----------+
    |    03.2020|
    +-----------+
    

    Output:

    +-----------+-------+
    |string_date|as_date|
    +-----------+-------+
    |    03.2020|03.2020|
    +-----------+-------+