I have the following Paragraph that does some Outlier detection using the InterQuartileRange method and strangely it runs in an error, but Apache Zeppelin is kind of truncating it to be useful.
Here is the code:
def interQuartileRangeFiltering(df: DataFrame): DataFrame = {
@tailrec
def inner(cols: Seq[String], acc: DataFrame): DataFrame = cols match {
case Nil => acc
case column :: xs =>
val quantiles = acc.stat.approxQuantile(column, Array(0.25, 0.75), 0.0) // TODO: values should come from config
val q1 = quantiles(0)
val q3 = quantiles(1)
val iqr = q1 - q3
val lowerRange = q1 - 1.5 * iqr
val upperRange = q3 + 1.5 * iqr
inner(xs, acc.filter(s"$column < $lowerRange or value > $upperRange"))
}
inner(df.columns.toSeq, df)
}
Here is the error when run in Apache Zeppelin:
scala.MatchError: WrappedArray(NEAR BAY, ISLAND, NEAR OCEAN, housing_median_age, population, total_bedrooms, <1H OCEAN, median_house_value, longitude, INLAND, latitude, total_rooms, households, median_income) (of class scala.collection.mutable.WrappedArray$ofRef)
at inner$1(<console>:74)
at interQuartileRangeFiltering(<console>:85)
... 56 elided
I have indeed verified the corresponding setting in the spark interpreter to true:
zeppelin.spark.printREPLOutput
Any ideas as to what is wrong here with my approach and how to get Apache Zeppelin to print the whole stacktrace so that I can find out what the actual problem is?
As a workaround, you can see full stack trace via next snippet:
lastException.printStackTrace(System.out)
You can also wrap your code with try/catch to do the same.
try {
// code
} catch {
case e: Throwable => e.printStackTrace(System.out)
}