node.jsgoogle-cloud-platformgoogle-cloud-nl

Any way to get passed the minimum of 20 tokens for text classification - Google NLP API


Is there anyway to get passed the minimum token requirement for google's NLP API text classification method? I'm trying to input a short simple sentence such as "I can't wait for the presidential debates" but this would return an error saying:

Invalid text content: too few tokens (words) to process.

Is there any way to get around this? I've inputting random words until the inputted string got to 20 characters but that messes up the labels and confidence a lot of the time. If there is any way around this such as setting an option or adding something that would be awesome! If there is no workaround, let me know if you know of another pre-trained text classification model that would work for me!

Also, I can't create the categorizes and labels I want. There would just be too many needed for what I'm doing so that's why these predefined categories in nlp api is great. Just need to get rid of that 20 character requirement.


Solution

  • As clarified in the official Content Classification documentation:

    Important: You must supply a text block (document) with at least twenty tokens (words) to the classifyText method.

    Considering that, checking for possible alternatives, it seems that, unfortunately, there isn't a way to workaround this. Indeed, you will need to supply at least 20 words.

    For this reason, searching around, I found this one here and this other - this one in Chinese, but it might help you :) - of pre-trained models for Text Classification that I believe might help you.

    Anyway, feel free to raise a Feature Request in Google's Issue Tracker, for them to check about the possibility of removing this limitation.

    Let me know if the information helped you!