[SOLVED] Elasticsearch with Spark, dynamic index creation based on dataframe column

Elasticsearch with Spark, dynamic index creation based on dataframe column

I have a spark dataframe which has a column say "name". The name could have different values in a single dataframe. When I write my data to elasticsearch using spark (scala), I want to write the data to different indexes based on the value of the column "name".

dataframe.saveToEs("index-name")

The saveToEs expects a string, I am looking for something on the lines of :

dataframe.saveToEs(col(""))

or something similar where I can assign the value during write time.

Solution

Mythic,

I just saw in the documentation you can use something like this :

rdd.saveToEs("my-collection-{media_type}/doc")

which allows you to :

Save each object based on its resource pattern, in this example based on media_type. For each document/object about to be written, elasticsearch-hadoop will extract the media_type field and use its value to determine the target resource.

Source : https://www.elastic.co/guide/en/elasticsearch/hadoop/master/spark.html#spark-write-dyn-scala