javalucenestandardanalyzer

what is the appropriate lucene analyzer to use?


i have problems with regards to indexing item names with numbers and symbols. a sample of my data is shown below:

ANGLE BARS   ORANGE - 4.0MM 2 - 1/2"
B.I SQUARE TUBING     2" X 3"
B.I. PIPE S-40   10MM 3/8"
B.I SQUARE TUBING     1" X 2"
PLYWOOD   MARINE 3/4X4X8
PLYWOOD   STA. CLARA 1/8X4X8
PLYWOOD   STA. CLARA 3/16X4X8

i want to tokenize my data in white or trailing spaces without dropping the symbols because these symbols are very essential. so that whenever i search for "plywood sta. clara", "b.i square 2" X 3"", or "angle orange 2 - 1/2" will give me a result. i tried to used whitespace analyzer but the symbols are dropped. i also tried standardanalyzer but stop words and symbols are also dropped. what is the best analyzer to use instead?


Solution

  • You can use PatternAnalyzer by writing regular expression or create Custom Analyzer.