[SOLVED] Azure Synapse Apache Spark Pools: .gz package added but notebook run error says not found

Azure Synapse Apache Spark Pools: .gz package added but notebook run error says not found

I have notebooks that contain r code. On their own, they work fine when run manually. To schedule and automate these workbooks, we have to use pipelines to call the r notebooks. However, pipelines don't allow inline install of the r library packages. So, we have to install those packages on the Apache spark pool. I was able to do this successfully for all of the packages and their dependencies. I’ve run into a problem with the HydroVuR.gz package (This also happens to be our 1 custom package. hmmm??). While I have been able to add all of the packages to the Apache Spark Pool, when I run the notebook, I get an error specific to only our Hydrovu package.

You can see below that the HydroVuR.tar.gz package is there on the SparkPool. However, I get a HydroVu specific error when I try to call it.

Since these packages are installed onto the pool and no longer Inline… I comment out the install.packages lines and just run the libraries

[1] "Error in library(HydroVuR): there is no package called ‘HydroVuR’"

The STDOUT when I installed it on the pool looks like it was successful?

Is something wrong with my HydroVuR.tar.gz package? Why can't the pool see it?

Dependency packages and versions (Downloaded from CRAN):

Solution

I think I found the problem. My .tar.gz package had no version number. Once I added HydroVuR_0.0.0.9000.tar.gz to the file name of the package, I was able to install it and see it. Then, I was able to run my pipeline to success. I think the problem is now solved. Thanks for your help Dileep.