azure apache-spark azure-blob-storage azure-databricks azure-log-analytics

The condition specified using HTTP conditional header(s) is not met. Azure databricks is unable to read json file from blob storage

Resources used

Azure Log Analytics
Blob Storage v2
Azure Databricks job

Log analytics is writing data to blob storage account in a container using export rule. Databricks is having same container mounted and running pipeline which is reading data every one hour and doing transformations. Databricks pipeline is running sometimes properly and sometimes it is giving following error. I can understand there is race condition while reading and writing data to blob storage. Log analytics data export rule doesn't have fixed time threshold to send data to storage. Any idea to handle this race condition?

Caused by: java.io.IOException: Operation failed: "The condition specified using HTTP conditional header(s) is not met.", 412, GET, https://xxx.dfs.core.windows.net/xx-xxx/WorkspaceResourceId%3D/subscriptions/xxx.json?timeout=90, ConditionNotMet, "The condition specified using HTTP conditional header(s) is not met. RequestId:xxx-xxxx-xxxx-xxxx-xxx Time:xxx-04-03T20:11:21.xxxx"
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsInputStream.readRemote(AbfsInputStream.java:673)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsInputStream.readInternal(AbfsInputStream.java:619)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsInputStream.readOneBlock(AbfsInputStream.java:409)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsInputStream.read(AbfsInputStream.java:346)
    at java.io.DataInputStream.read(DataInputStream.java:149)
    at com.databricks.common.filesystem.LokiAbfsInputStream.$anonfun$read$3(LokiABFS.scala:204)
    at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
    at com.databricks.common.filesystem.LokiAbfsInputStream.withExceptionRewrites(LokiABFS.scala:194)

Solution

"The condition specified using HTTP conditional header(s) is not met.", 412, GET, https://xxx.dfs.core.windows.net/xx-xxx/WorkspaceResourceId%3D/subscriptions/xxx.json?timeout=90, ConditionNotMet, "The condition specified using HTTP conditional header(s) is not met."

As per this,

When a write operation is done on a blob, the ETag of the Blob is reset, let's say 0x8CDA1BF0593B660. And, before it is triggered (with ETag value 0x8CDA1BF0593B660), the blob is updated by another service, and its ETag is changed to 0x8CDA1BF0593B661.

That might be the reason for getting the above error while reading a JSON file from a storage account in Databricks. The concurrency behavior can be configured according to the hadoop-azure library documentation. It is the library used to access ADLS (abfss).

For more information, you can refer to the links below: