javaamazon-web-servicesapache-sparkaws-gluejohnsnowlabs-spark-nlp

Glue job failed with `JohnSnowLabs spark-nlp dependency not found` error randomly


I'm using AWS Glue to run some pyspark python code, sometimes it succeeded but sometimes failed with a dependency error: Resource Setup Error: Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: JohnSnowLabs#spark-nlp;2.5.4: not found], here is the error logs:

:: problems summary ::
:::: WARNINGS
        module not found: JohnSnowLabs#spark-nlp;2.5.4

    ==== local-m2-cache: tried

      file:/root/.m2/repository/JohnSnowLabs/spark-nlp/2.5.4/spark-nlp-2.5.4.pom

      -- artifact JohnSnowLabs#spark-nlp;2.5.4!spark-nlp.jar:

      file:/root/.m2/repository/JohnSnowLabs/spark-nlp/2.5.4/spark-nlp-2.5.4.jar

    ==== local-ivy-cache: tried

      /root/.ivy2/local/JohnSnowLabs/spark-nlp/2.5.4/ivys/ivy.xml

      -- artifact JohnSnowLabs#spark-nlp;2.5.4!spark-nlp.jar:

      /root/.ivy2/local/JohnSnowLabs/spark-nlp/2.5.4/jars/spark-nlp.jar

    ==== central: tried

      https://repo1.maven.org/maven2/JohnSnowLabs/spark-nlp/2.5.4/spark-nlp-2.5.4.pom

      -- artifact JohnSnowLabs#spark-nlp;2.5.4!spark-nlp.jar:

      https://repo1.maven.org/maven2/JohnSnowLabs/spark-nlp/2.5.4/spark-nlp-2.5.4.jar

    ==== spark-packages: tried

      https://dl.bintray.com/spark-packages/maven/JohnSnowLabs/spark-nlp/2.5.4/spark-nlp-2.5.4.pom

      -- artifact JohnSnowLabs#spark-nlp;2.5.4!spark-nlp.jar:

      https://dl.bintray.com/spark-packages/maven/JohnSnowLabs/spark-nlp/2.5.4/spark-nlp-2.5.4.jar

        ::::::::::::::::::::::::::::::::::::::::::::::

        ::          UNRESOLVED DEPENDENCIES         ::

        ::::::::::::::::::::::::::::::::::::::::::::::

        :: JohnSnowLabs#spark-nlp;2.5.4: not found

        ::::::::::::::::::::::::::::::::::::::::::::::



:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: JohnSnowLabs#spark-nlp;2.5.4: not found]

From the logs of a successful run, I can see that glue was able to download the dependency from https://dl.bintray.com/spark-packages/maven/JohnSnowLabs/spark-nlp/2.5.4/spark-nlp-2.5.4.pom, where the failed job has also tried to download from, but failed.

This issue seems to resolve itself last week, but in the last couple of days it showed up again, and hasn't resolved itself so far. Has anyone ever seen this weird issue? Thanks.


Solution

  • spark-packages moved on May 1 2021. In my scala project I had to add a different resolver like so. It's got to be similar in java.

    resolvers in ThisBuild ++= Seq(
      "SparkPackages" at "https://repos.spark-packages.org"
     ## remove -> "MVNRepository"  at "https://dl.bintray.com/spark-packages/maven"
    )
    

    Go look yourself, that package isn't on that resolver you were looking for. Mine wasn't either.

    https://dl.bintray.com/spark-packages/