apache-sparkpredictionio

How many events are stored in my PredictionIO event server?


I imported an unknown number of events into my PIO eventserver and now I want to know that number (in order to measure and compare recommendation engines). I could not find an API for that, so I had a look at the MySQL database my server uses. I found two tables:

mysql> select count(*) from pio_event_1;
+----------+
| count(*) |
+----------+
|  6371759 |
+----------+
1 row in set (8.39 sec)

mysql> select count(*) from pio_event_2;
+----------+
| count(*) |
+----------+
|  2018200 |
+----------+
1 row in set (9.79 sec)

Both tables look very similar, so I am still unsure.

Which table is relevant? What is the difference between pio_event_1 and pio_event_2?

Is there a command or REST API where I can look up the number of stored events?


Solution

  • You could go through the spark shell, described in the troubleshooting docs

    Launch the shell with

    pio-shell --with-spark
    

    Then find all events for your app and count them

    import io.prediction.data.store.PEventStore
    PEventStore.find(appName="MyApp1")(sc).count
    

    You could also filter to find different subsets of events by passing more parameters to find. See the api docs for more details. The LEventStore is also an option