According to the documentation to_timestamp function ideally should return null instead of throwing a Fail to parse error:
The following code throws Caused by: java.time.format.DateTimeParseException: Text '17-08-01' could not be parsed at index 0
import org.apache.spark.sql.functions.{col, to_timestamp}
val df1 = Seq(("abc", "17-08-01")).toDF("id", "eventTime")
val df2 = df1.withColumn("eventTime1",to_timestamp(col("eventTime"),"yyyy-MM-dd"))
df2.show()
Based on the documentation to_timestamp function returns -> A timestamp, or null if s was a string that could not be casted to a timestamp or fmt was an invalid format
Are you using spark 3 ? It seems that this behaviour is no longer supported since spark 3.0 (they shoud've updated the docs) see the below error at the beginning of your stack:
Exception in thread "main" org.apache.spark.SparkUpgradeException: You may get a different
result due to the upgrading of Spark 3.0: Fail to parse '17-08-01' in the new parser.
You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior
before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.
If you want the pre spark 3.0 behaviour, you need to set one of these confs, the second one seems to fit more your needs:
spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
spark.conf.set("spark.sql.legacy.timeParserPolicy", "CORRECTED")