apache-sparkelasticsearchcurlpysparkelasticsearch-hadoop

Elasticsearch pyspark connection in insecure mode


My end goal is to insert data from hdfs to elasticsearch but the issue i am facing is the connectivity

I am able to connect to my elasticsearch node using below curl command

curl -u username -X GET https://xx.xxx.xx.xxx:9200/_cat/indices?v' --insecure

but when it comes to connection with spark I am unable to do so. My command to insert data is df.write.mode("append").format('org.elasticsearch.spark.sql').option("es.net.http.auth.user", "username").option("es.net.http.auth.pass", "password").option("es.index.auto.create","true").option('es.nodes', 'https://xx.xxx.xx.xxx').option('es.port','9200').save('my-index/my-doctype')

Error i am getting is

org.elastisearch.hadoop.EsHadoopIllegalArgumentException:Cannot detect ES version - typical this happens if then network/Elasticsearch cluster is not accessible or when targetting a Wan/Cloud instance without the proper setting 'es.nodes.wan.only'
....
....
Caused by: org.elasticseach.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proy settings)- all nodes failed; tried [[xx.xxx.xx.xxx:9200]]
....
...

Here, What would be the pyspark equivalent of curl --insecure

Thanks


Solution

  • After many attempt and different config options. I found a way how to connect elastisearch running on https insecurely

            dfToEs.write.mode("append").format('org.elasticsearch.spark.sql') \
            .option("es.net.http.auth.user", username) \
            .option("es.net.http.auth.pass", password) \
            .option("es.net.ssl", "true") \
            .option("es.net.ssl.cert.allow.self.signed", "true") \
            .option("mergeSchema", "true") \
            .option('es.index.auto.create', 'true') \
            .option('es.nodes', 'https://{}'.format(es_ip)) \
            .option('es.port', '9200') \
            .option('es.batch.write.retry.wait', '100s') \
            .save('{index}/_doc'.format(index=index))
    

    with the

    (es.net.ssl, true)
    

    We also have to provide self signed certificate like below

    (es.net.ssl.cert.allow.self.signed, true)