scalaapache-sparkapache-toreebigdata

Editing Spark Module in Spark-kernel


We are currently editing a specific module in Spark. We are using spark-kernel https://github.com/ibm-et/spark-kernel to run all our spark jobs. So, what we did is compile again the code that we have edited. This produces a jar file. However, we do not know how to point the code to the jar file.

It looks like it is referencing again to the old script and not to the newly edited and newly compiled one. Do you have some idea on how to modify some spark packages/modules and reflect the changes with spark-kernel? If we're not going to use spark-kernel, is there a way we can edit a particular module in spark for example, the ALS module in spark: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala. Thanks!


Solution

  • You likely edited a scala or java file and recompiled (even though you call them scripts, they are not scripts in the strict sense because they are not interperted). Assuming that's what you did....

    You probably then don't have a clean replacement of the resulting JAR file in the deployment you are testing. Odds are your newly compiled JAR file is somewhere, just not in the somewhere you are observing. To get it there properly, you will have to build more than the JAR file, you will have to repackage your installable and reinstall.

    Other techniques exist, if you can identify the unpacked item in an installation, sometimes you can copy it in place; however, such a technique is inherently unmaintainable, so I recommend it only on throw away verification of the change and not on any system that will be used.

    Keep in mind that with Spark, sometimes the worker nodes are dynamically deployed. If that is so, you might have to locate the installable of the dynamic deployment system and assure you have the right packaging there too.