predictionio

PredictionIO Training Error: ArrayIndexOutOfBoundsException


We have a PIO 0.11.0 instance running, and we are attempting to use the UR engine, version 0.6.0 (https://github.com/actionml/universal-recommender). We have loaded the eventserver up with our training data, and when we run pio train the following error is produced:

[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: com.actionml.DataSource@d36c1c3
[INFO] [Engine$] Preparator: com.actionml.Preparator@437281c5
[INFO] [Engine$] AlgorithmList: List(com.actionml.URAlgorithm@27da994b)
[INFO] [Engine$] Data sanity check is on.
[WARN] [TableInputFormatBase] Cannot resolve the host name for ACPIOTest/127.0.1.1 because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '1.1.0.127.in-addr.arpa'
[INFO] [DataSource] Received events List()
[WARN] [TableInputFormatBase] Cannot resolve the host name for ACPIOTest/127.0.1.1 because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '1.1.0.127.in-addr.arpa'
[INFO] [Engine$] com.actionml.TrainingData does not support data sanity check. Skipping check.
[INFO] [Engine$] com.actionml.PreparedData does not support data sanity check. Skipping check.
[INFO] [URAlgorithm] Actions read now creating correlators
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
        at org.apache.mahout.math.cf.SimilarityAnalysis$.cooccurrencesIDSs(SimilarityAnalysis.scala:145)
        at com.actionml.URAlgorithm.calcAll(URAlgorithm.scala:311)
        at com.actionml.URAlgorithm.train(URAlgorithm.scala:285)
        at com.actionml.URAlgorithm.train(URAlgorithm.scala:175)
        at org.apache.predictionio.controller.P2LAlgorithm.trainBase(P2LAlgorithm.scala:49)
        at org.apache.predictionio.controller.Engine$$anonfun$18.apply(Engine.scala:692)
        at org.apache.predictionio.controller.Engine$$anonfun$18.apply(Engine.scala:692)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.AbstractTraversable.map(Traversable.scala:105)
        at org.apache.predictionio.controller.Engine$.train(Engine.scala:692)
        at org.apache.predictionio.controller.Engine.train(Engine.scala:177)
        at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67)
        at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:250)
        at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Our engine.json is as follows:

{
  "comment":" This config file uses default settings for all but the required values see README.md for docs",
  "id": "default",
  "description": "Default settings",
  "engineFactory": "com.actionml.RecommendationEngine",
  "datasource": {
    "params" : {
      "name": "sample-handmade-data.txt",
      "appName": "ACRec",
      "eventNames": ["purchase", "view"]
    }
  },
  "sparkConf": {
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
    "spark.kryo.referenceTracking": "false",
    "spark.kryoserializer.buffer.mb": "300",
    "spark.kryoserializer.buffer": "300m",
    "es.index.auto.create": "true"
  },
  "algorithms": [
    {
      "comment": "simplest setup where all values are default, popularity based backfill, must add eventsNames",
      "name": "ur",
      "params": {
        "appName": "ACRec",
        "indexName": "urindex",
        "typeName": "items",
        "comment": "must have data for the first event or the model will not build, other events are optional",
        "eventNames": ["purchase", "view"]
      }
    }
  ]
}

Any help is appreciated.


Solution

  • PIO can not find any purchase events in your data (may be they have wrong format). As result it can't build correlation between users & items.

    Can you show sample events?