apache-sparkelasticsearchpysparkelasticsearch-hadoop

Elasticsearch Spark, how to query multiple times?


I'm on jupyter notebook.

I'd like to use query dsl to prepare initial Dataframe.

I use conf.set("es.query", dsl_query) for that. (https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html#_querying)

But then, I want to apply different query to prepare another Dataframe, and I can't find a way to apply a new dsl_query without creating new SparkContext

But I didn't find a way to recreate a SparkContext inside jupyter environment either.

I want to run analysis using QueryDSL-1 as baseline then run another anlysis using QueryDSL-2 as another baseline

Is there a way of doing this without creating two notebook?


Solution

  • You'll just need to specify the es.query as an option to your DataFrameReader i.e :

    spark.read.option("es.query", dsl_query).option("...", "...")