google-cloud-nl

Train or Custom Word Entity Types?


I was looking through the documentation and testing Google's Natural Language API and noticed it gets a number of people, events, organizations, and locations incorrect - it appears to be using Wikipedia as a major data source so if it is not in Wikipedia it seems to have trouble identifying the type of various words. Also, if certain words appear in a name (proper noun) it seems to always identify an entity as a certain type which is not always correct.

For instance: "Congress" seems to always identify as an organization [government] even when it is part of an event name. The name "WordCamp" shows as a location, but it is an event.

Is there a way to train the Natural Language engine or provide a custom set of organizations, locations, events, etc. so that it provides more accurate type information for entities that are not extremely popular?


Solution

  • I am the Product manager for this product. Custom entity types are not currently supported. As per your comment about not getting some entity types right, this is true for any NLP system but our goal is to keep improving. We are working on ways for you to provide us feedback on instances that we get wrong to improve our accuracy and will share the details shortly. Note we have trained our models on multiple data sources and not just Wikipedia data. The API returns the most relevant Wikipedia article for an entity detected so if an entity has multiple interpretations, we will only return the most commonly used interpretation.