apache-sparkelasticsearchapache-spark-sqlelasticsearch-spark

Elasticsearch with Spark, dynamic index creation based on dataframe column


I have a spark dataframe which has a column say "name". The name could have different values in a single dataframe. When I write my data to elasticsearch using spark (scala), I want to write the data to different indexes based on the value of the column "name".

dataframe.saveToEs("index-name")

The saveToEs expects a string, I am looking for something on the lines of :

dataframe.saveToEs(col(""))

or something similar where I can assign the value during write time.


Solution

  • Mythic,

    I just saw in the documentation you can use something like this :

    rdd.saveToEs("my-collection-{media_type}/doc")
    

    which allows you to :

    Save each object based on its resource pattern, in this example based on media_type. For each document/object about to be written, elasticsearch-hadoop will extract the media_type field and use its value to determine the target resource.

    Source : https://www.elastic.co/guide/en/elasticsearch/hadoop/master/spark.html#spark-write-dyn-scala