scalaapache-sparkelasticsearchelasticsearch-hadoop

Spark 3.0 scala.None$ is not a valid external type for schema of string


While using elasticsearch-hadoop library for reading elasticsearch index with empty attribute, getting the exception

Caused by: java.lang.RuntimeException: scala.None$ is not a valid external type for schema of string

There is open defect in github for the same with steps to reproduce it: https://github.com/elastic/elasticsearch-hadoop/issues/1635

Spark: 3.1.1
Elasticsearch-Hadoop : elasticsearch-spark-30_2.12-7.12.0
Elasticsearch : 2.3.4


Solution

  • It worked by setting elasticsearch-hadoop property es.field.read.empty.as.null = no

    .option("es.field.read.empty.as.null", "no")
    

    From Elasticsearch Link:
    es.field.read.empty.as.null (default yes)
    Whether elasticsearch-hadoop will treat empty fields as null.