azure azure-pipelines azure-synapse spark-notebook

Errorcode:6002 in Azure Synapse Analytics pipeline

We got following error after running notebook in pipeline, in which data is transformed and the saved. While data write to csv if commented out then pipeline working. And in normal notebook run data write to csv is also working fine but only in pipeline it is breaking. platform - Azure Synapse Analytics / Workspace / pipeline Language - python in pyspark

{
"errorCode": "6002",
"message": "Py4JJavaError: An error occurred while calling o666.csv.\n: java.nio.file.AccessDeniedException: Operation failed: \"This request is not authorized to perform this operation using this permission.\", 403, HEAD, https://bcnpricing.dfs.core.windows.net/test/test/data/output/test_df3.csv?upn=false&action=getStatus&timeout=90\n\tat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.checkException(AzureBlobFileSystem.java:1185)\n\tat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:504)\n\tat org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1696)\n\tat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.exists(AzureBlobFileSystem.java:1013)\n\tat org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:119)\n\tat org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)\n\tat org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)\n\tat org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)\n\tat org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:218)\n\tat org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:256)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)\n\tat org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:253)\n\tat org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:214)\n\tat org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:148)\n\tat org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:147)\n\tat org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:995)\n\tat org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:107)\n\tat org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:181)\n\tat org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:94)\n\tat org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)\n\tat org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)\n\tat org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:995)\n\tat org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:444)\n\tat org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:416)\n\tat org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:294)\n\tat org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:985)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:282)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: Operation failed: \"This request is not authorized to perform this operation using this permission.\", 403, HEAD, https://bcnpricing.dfs.core.windows.net/test/test/data/output/test_df3.csv?upn=false&action=getStatus&timeout=90\n\tat org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:207)\n\tat org.apache.hadoop.fs.azurebfs.services.AbfsClient.getPathStatus(AbfsClient.java:570)\n\tat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFileStatus(AzureBlobFileSystemStore.java:802)\n\tat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:502)\n\t... 35 more\n\nTraceback (most recent call last):\n\n  File \"/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py\", line 1372, in csv\n    self._jwrite.csv(path)\n\n  File \"/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/py4j/java_gateway.py\", line 1304, in __call__\n    return_value = get_return_value(\n\n  File \"/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py\", line 111, in deco\n    return f(*a, **kw)\n\n  File \"/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/py4j/protocol.py\", line 326, in get_return_value\n    raise Py4JJavaError(\n\npy4j.protocol.Py4JJavaError: An error occurred while calling o666.csv.\n: java.nio.file.AccessDeniedException: Operation failed: \"This request is not authorized to perform this operation using this permission.\", 403, HEAD, https://bcnpricing.dfs.core.windows.net/test/test/data/output/test_df3.csv?upn=false&action=getStatus&timeout=90\n\tat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.checkException(AzureBlobFileSystem.java:1185)\n\tat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:504)\n\tat org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1696)\n\tat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.exists(AzureBlobFileSystem.java:1013)\n\tat org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:119)\n\tat org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)\n\tat org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)\n\tat org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)\n\tat org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:218)\n\tat org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:256)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)\n\tat org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:253)\n\tat org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:214)\n\tat org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:148)\n\tat org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:147)\n\tat org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:995)\n\tat org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:107)\n\tat org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:181)\n\tat org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:94)\n\tat org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)\n\tat org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)\n\tat org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:995)\n\tat org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:444)\n\tat org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:416)\n\tat org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:294)\n\tat org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:985)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:282)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: Operation failed: \"This request is not authorized to perform this operation using this permission.\", 403, HEAD, https://bcnpricing.dfs.core.windows.net/test/test/data/output/test_df3.csv?upn=false&action=getStatus&timeout=90\n\tat org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:207)\n\tat org.apache.hadoop.fs.azurebfs.services.AbfsClient.getPathStatus(AbfsClient.java:570)\n\tat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFileStatus(AzureBlobFileSystemStore.java:802)\n\tat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:502)\n\t... 35 more\n\n",
"failureType": "UserError",
"target": "Product_Data_pipeline3",
"details": []

Solution

The error which occurred indicates that it is due to an attempt to perform an operation without authorization. The reason is because you are trying to access the storage account, https://bcnpricing.dfs.core.windows.net/test/test/data/output/test_df3.csv.

When AccessDeniedException occurs while trying to access the storage account, it is because the synapse workspace lacks permission to access the storage accounts. So, to get access to the specific storage account, you need to grant the Storage blob Contributor role in synapse MSI.

Here is the link which addresses a similar problem.

To add the Storage blob Contributor role to your synapse workspace, you can refer to this Microsoft documentation.

Note: Contact the owner of the Resource to change the role. If you are the owner, then you already have Storage Blob Contributor role.