datepyspark

Pyspark from_unixtime (unix_timestamp) does not convert to timestamp


I am using Pyspark with Python 2.7. I have a date column in string (with ms) and would like to convert to timestamp

This is what I have tried so far

df = df.withColumn('end_time', from_unixtime(unix_timestamp(df.end_time, '%Y-%M-%d %H:%m:%S.%f')) )

printSchema() shows end_time: string (nullable = true)

when I expended timestamp as the type of variable


Solution

  • Try using from_utc_timestamp:

    from pyspark.sql.functions import from_utc_timestamp
    
    df = df.withColumn('end_time', from_utc_timestamp(df.end_time, 'PST')) 
    

    You'd need to specify a timezone for the function, in this case I chose PST

    If this does not work please give us an example of a few rows showing df.end_time