webbigdataclassificationtraining-datagoogle-prediction

Google prediction API - Training data syntax for multi classification


Trying to harness the power of Google Prediction API, to classify my data. Each item in my DB can have multi categories assign to it.

For example: "My Nexus phone is rebooting constantly" could be assigned both #Android and #troubleshooting tags.

I would like to upload my training data to Google, but I'm not sure how to apply both tags to the same content. In the following example I've found the syntax that provide one category for each content like so:

"Android" ,"My Nexus phone is rebooting constantly"

What is the right syntax for multi-classification training data?


Solution

  • From the docs:

    Each line can only have one label assigned, but you can apply multiple labels to one example by repeating an example and applying different labels to each one. For example:

    "excited", "OMG! Just had a fabulous day!"

    "annoying", "OMG! Just had a fabulous day!"

    If you send a tweet to this model, you might get a classification something like this: "excited":0.6, "annoying":0.2.