searchsolrlucenetokenizesolr-schema

solr Japanese tokenizer not working for katakana


I am using solr-6.2.0 and filedType : text_ja .
I am facing problem with JapaneseTokenizer, its properly tokenising
ドラゴンボールヒーロー


"ドラゴン"      "ドラゴンボールヒーロー"      "ボール" "ヒーロー"



But its failing to tokenize ドラゴンボールヒーローズ properly,
ドラゴンボールヒーローズ

"ドラゴン"       "ドラゴンボールヒーローズ"      "ボールヒーローズ"

Hence searching with ドラゴンボール doesn't hit in later case .

Also it doesn't seperate ディズニーランド into two words .


Solution

  • I was able to solve this using lucene-gosen Sen Tokenizer,
    and compiling ipadic dictionary with custom rules and word weights.