ibm-watsonwatsonwatson-discoverywatson-knowledge-studio

Watson Knowledge Studio cuts words


After succesfully importing documents into WKS, some words get wrongly cut when you create an annotator and select this documents. This happens with German words for example instead of "widerspruchslos" "widerspruch los" or instead of "Warenverkehrsbescheinigung" "Warenverkehr bescheinigung" will be displayed which has consequences for the annotatin process and generating the model later on. How can I avoid this issue?


Solution

  • German compound words are split into fragments by the sentence tokenizer in WKS. This behavior is by design.

    If you would like to extract "Warenverkehrsbescheinigung" as a single mention, select the 2 tokens "Warenverkehr" & "bescheinigung" and put an entity on them.