apache-sparkdatetimepysparkapache-spark-sql

How can I account for AM/PM in string to DateTime conversion in pyspark?


My datetime is in following format:

           visit_dts |web_datetime|
+--------------------+------------+
| 5/1/2018 3:48:14 PM|        null|

Based on answer provided here, I am using following query to convert string into datetime format:

web1 = web1.withColumn("web_datetime", from_unixtime(unix_timestamp(col("visit_dts"), "%mm/%dd/%YY %I:%M:%S %p")))

But it is not working. Any lead would be great.


Solution

  • You can do like below to achieve your result

    from pyspark.sql import Row
    
    df = sc.parallelize([Row(visit_dts='5/1/2018 3:48:14 PM')]).toDF()
    
    import pyspark.sql.functions as f
    
    web = df.withColumn("web_datetime", f.from_unixtime(f.unix_timestamp("visit_dts",'MM/dd/yyyy hh:mm:ss aa'),'MM/dd/yyyy HH:mm:ss'))
    

    This should give you

    web.show()
    
    +-------------------+-------------------+
    |          visit_dts|       web_datetime|
    +-------------------+-------------------+
    |5/1/2018 3:48:14 PM|05/01/2018 15:48:14|
    +-------------------+-------------------+