pythonapache-sparkelasticsearchpysparkelasticsearch-hadoop

How to ignore exceptions when bulk update with pyspark if doc doesn't exist


I am trying to do an update operation with elasticsearch hadoop package in pyspark. It says on the documentation that if no data is found, an exception is thrown. What is the best way to ignore this exception in pyspark? Or is it possible to pass something like raise_on_exception=False, raise_on_error=False provided with python elasticsearch API? Thanks!


Solution

  • Finally got the answer here: https://discuss.elastic.co/t/how-to-ignore-exceptions-when-bulk-update-with-pyspark-if-doc-doesnt-exist/90739/2

    "there is no way to suppress the error when it occurs. If a value is missing when an update is executed, there's nothing for the connector to do but fail the task."