I have a dataframe with timestamp in the following format "yyyy-MM-dd HH:mm:ss.SSSSSSS" I want to trim the milliseconds and nanoseconds from the given string and convert that into datetime type.
I tried using the to_timestamp() method to convert from string to timestamp format, I am successful in that but I am getting the milliseconds and Nanoseconds at the end.
I tried following to remove milliseconds but none of them worked.
to_timestamp($"column_name", "YYYY-mm-dd HH:MM:ss")
but I am getting the default format as output. This method did not recoganize my custom date time format. Default format I got is --> "YYYY-mm-ddTHH:MM:ss.sssss+sss"
.withColumn("datetype_timestamp",
to_timestamp(col("RunStartTime"),"YYYY-mm-dd HH:MM:ss")
)
Above is my code sample, can someone suggest what I should do here please? Thank you for your time :)
Cluster details: 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)
I don't know if this is the best/most elegant approach to this, but I could use a combination of to_timestamp
and date_format
to achieve this:
.withColumn(
"datetype_timestamp",
to_timestamp(date_format(col("input_timestamp"), "yyyy-mm-dd HH:MM:ss"))
// input_timestamp would be RunStartTime in your case
)
And this was the output:
+---------------------------+-------------------+
|input_timestamp |datetype_timestamp |
+---------------------------+-------------------+
|2022-02-12 12:12:12.4398715|2022-12-12 12:02:12|
+---------------------------+-------------------+