razure-synapsesparkrazure-notebooksazure-synapse-pipeline

Azure Synapse Apache Spark Pools: .gz package added but notebook run error says not found


I have notebooks that contain r code. On their own, they work fine when run manually. To schedule and automate these workbooks, we have to use pipelines to call the r notebooks. However, pipelines don't allow inline install of the r library packages. So, we have to install those packages on the Apache spark pool. I was able to do this successfully for all of the packages and their dependencies. I’ve run into a problem with the HydroVuR.gz package (This also happens to be our 1 custom package. hmmm??). While I have been able to add all of the packages to the Apache Spark Pool, when I run the notebook, I get an error specific to only our Hydrovu package.

You can see below that the HydroVuR.tar.gz package is there on the SparkPool. However, I get a HydroVu specific error when I try to call it.

enter image description here

Since these packages are installed onto the pool and no longer Inline… I comment out the install.packages lines and just run the libraries

enter image description here

[1] "Error in library(HydroVuR): there is no package called ‘HydroVuR’"

The STDOUT when I installed it on the pool looks like it was successful?

enter image description here

Is something wrong with my HydroVuR.tar.gz package? Why can't the pool see it?

Dependency packages and versions (Downloaded from CRAN): enter image description here enter image description here


Solution

  • I think I found the problem. My .tar.gz package had no version number. Once I added HydroVuR_0.0.0.9000.tar.gz to the file name of the package, I was able to install it and see it. Then, I was able to run my pipeline to success. I think the problem is now solved. Thanks for your help Dileep.