scaladatetimeapache-spark-sqldatabricksdatetime2

to_timestamp() in scala returns default timestamp format


I have a dataframe with timestamp in the following format "yyyy-MM-dd HH:mm:ss.SSSSSSS" I want to trim the milliseconds and nanoseconds from the given string and convert that into datetime type.

I tried using the to_timestamp() method to convert from string to timestamp format, I am successful in that but I am getting the milliseconds and Nanoseconds at the end.

I tried following to remove milliseconds but none of them worked.

  1. I tried date Trucate method to remove the milliseconds it worked but it converts the column to string format.
  2. I tried with:
  to_timestamp($"column_name", "YYYY-mm-dd HH:MM:ss")

but I am getting the default format as output. This method did not recoganize my custom date time format. Default format I got is --> "YYYY-mm-ddTHH:MM:ss.sssss+sss"

.withColumn("datetype_timestamp",
          to_timestamp(col("RunStartTime"),"YYYY-mm-dd HH:MM:ss")
           )

Above is my code sample, can someone suggest what I should do here please? Thank you for your time :)

Cluster details: 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)


Solution

  • I don't know if this is the best/most elegant approach to this, but I could use a combination of to_timestamp and date_format to achieve this:

    .withColumn(
      "datetype_timestamp",
      to_timestamp(date_format(col("input_timestamp"), "yyyy-mm-dd HH:MM:ss"))
    // input_timestamp would be RunStartTime in your case
    )
    

    And this was the output:

    +---------------------------+-------------------+
    |input_timestamp            |datetype_timestamp |
    +---------------------------+-------------------+
    |2022-02-12 12:12:12.4398715|2022-12-12 12:02:12|
    +---------------------------+-------------------+