apache-sparkapache-spark-sqlto-date

Spark to_date generates strange 5 digit year


I have issued the following select statement against a HIVE table in HUE editor and got below result.

SELECT statestartdate,
to_date(from_unixtime(statestartdate, 'yyyy-MM-dd HH:mm:ss')) AS statestartdatestr
FROM processmining.InstanceUserPerformance limit 100

RESULT

statestartdate  statestartdatestr
1363782909107   45186-07-24

QUESTION In the year area of the 'statestartdatestr' field, I am getting 45186 for some strange reason!? Could anyone point out why please? Thnx.


Solution

  • It seems like your statestartdate is the number of milliseconds since the Unix epoch. The function from_unixtime takes the number of seconds from the Unix epoch, according to Cloudera docs.

    Therefore, if you wanted to convert your statestartdate to seconds and a date data type you could try something like this...

    SELECT statestartdate,
      to_date(
        from_unixtime(
          floor(statestartdate / 1000),
          'yyyy-MM-dd HH:mm:ss'
        )
      ) AS statestartdatestr
    FROM processmining.InstanceUserPerformance LIMIT 100