In the es cluster, it has a large scale data, we used spark to compute data but in the way of elasticsearch-hadoop
, followed by https://www.elastic.co/guide/en/elasticsearch/hadoop/current/spark.html
We have to read full columns of an index. Is there anything that help?
Yes, you can set config parameter "es.read.field.include" or "es.read.field.exclude" respectively. Full details here. Example assuming Spark 2 or higher.
val sparkSession:SparkSession = SparkSession
.builder()
.appName("jobName")
.config("es.nodes", "elastichostc1n1.example.com")
.config("es.read.field.include", "foo,bar")
.getOrCreate()