I'am searching for the technical word "AN-XYZ99". So I use
SELECT *
FROM foo
WHERE CONTAINS(bar, 'AN{-}XYZ99') > 0
but I get also results like "FO-XYZ99" or "BAR-XYZ99". What can I do to ensure the expected result?
I used
BEGIN
CTX_DDL.CREATE_PREFERENCE('FOO','BASIC_LEXER');
CTX_DDL.SET_ATTRIBUTE('FOO', 'ALTERNATE_SPELLING', 'GERMAN');
CTX_DDL.SET_ATTRIBUTE('FOO', 'COMPOSITE', 'GERMAN');
CTX_DDL.SET_ATTRIBUTE('FOO', 'MIXED_CASE', 'NO');
END;
Sample data from column "bar" (VARCHAR2(4000)):
"unbekannt Stadt Text: AN-XYZ99 << foobar Straße 31.12.2017 Datum Host 20160101 foo"
"unbekannt Stadt Text: FO-XYZ99 << foobar Straße 31.12.2017 Datum Host 20160101 bar"
"unbekannt Stadt Text: BAR-XYZ99 << foobar Straße 31.12.2017 Datum Host 20160101 bla"
With the Statement above I would like the first row as output but I get the second and third row as well.
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
First you must define hyphen as a printjoin
in your lexer.
check it with
select IXV_ATTRIBUTE, IXV_VALUE from CTXSYS.CTX_INDEX_VALUES where IXV_CLASS = 'LEXER';
IXV_ATTRIBUTE IXV_VALUE
-----------------------------------------
PRINTJOINS _$%&-
NUMJOIN .
NUMGROUP .
WHITESPACE ,=
Then you may (after re-creating index with this lexer) validate that the tokens are as expected: (your table would vary based on the index name; check all tables like 'DR$%$I')
select TOKEN_TEXT from DR$TEXTIDX_IDX$I where TOKEN_TEXT like '%-XYZ99';
TOKEN_TEXT
----------------------------------------------------------------
AN-XYZ99
BAR-XYZ99
FO-XYZ99
Now you may query for the search string.
Aparently you must escape the hyphen as BAR-XYZ99
will find rows with BAR
not containing XYZ99
; although the documentation of hyphen with no space
is a bit different.
SELECT SCORE(1),txt
FROM textidx
WHERE CONTAINS(txt, 'BAR-XYZ99',1) > 0;
SCORE(1) TXT
---------- ------------------------------------------------------------------------------------
4 unbekannt Stadt Text: FO-XYZ99 << foobar Straße 31.12.2017 Datum Host 20160101 bar
For some reason (I'm on 11.2.0.2.0) the escaping with curly braces doesn't work (returns no match), but using backslash is fine.
SELECT SCORE(1),txt
FROM textidx
WHERE CONTAINS(txt, 'BAR\-XYZ99',1) > 0;
SCORE(1) TXT
---------- ------------------------------------------------------------------------------------
4 unbekannt Stadt Text: BAR-XYZ99 << foobar Straße 31.12.2017 Datum Host 20160101 bla