I'm getting invalid timestamp when reading Elasticsearch records using Spark with elasticsearch-hadoop library. I'm using following Spark code for records reading:
val sc = spark.sqlContext
val elasticFields = Seq(
"start_time",
"action",
"category",
"attack_category"
)
sc.sql(
"CREATE TEMPORARY TABLE myIndex " +
"USING org.elasticsearch.spark.sql " +
"OPTIONS (resource 'aggattack-2021.01')" )
val all = sc.sql(
s"""
|SELECT ${elasticFields.mkString(",")}
|FROM myIndex
|""".stripMargin)
all.show(2)
Which leads to the following result:
+-----------------------+------+---------+---------------+
|start_time |action|category |attack_category|
+-----------------------+------+---------+---------------+
|1970-01-19 16:04:27.228|drop |udp-flood|DoS |
|1970-01-19 16:04:24.027|drop |others |DoS |
+-----------------------+------+---------+---------------+
But I'm expecting timestamp with current year, eg 2021-01-19 16:04:27.228
. In the elastic, start_time
field has unixtime format in millis -> start_time": 1611314773.641
Problem was with the data in ElasticSearch. start_time
field was mapped as epoch_seconds
and contained value epoch seconds with three decimal places (eg 1611583978.684
). Everything works fine after we have converted epoch time to millis without any decimal places