sqloracle-databaseoracle-text

Query with wildcard and dot not matching data with Oracle Text index


When using the wildcard character in combination with a dot in a text search, my query does not find the matching row.

For example:

CREATE TABLE MY_TABLE( ITEM_NUMBER VARCHAR2(50 BYTE) NOT NULL);
INSERT INTO MY_TABLE (ITEM_NUMBER) VALUES ('1234.1234');
create index TIX_ITEMNO on MY_TABLE(ITEM_NUMBER) indextype is ctxsys.context;

I want to find the row in MY_TABLE where ITEM_NUMBER column is '1234.1234'

This does find the row:

SELECT * FROM MY_TABLE
WHERE CONTAINS(ITEM_NUMBER, '%1234') > 0

This does not find the row:

SELECT * FROM MY_TABLE
WHERE CONTAINS(ITEM_NUMBER, '%.1234') > 0

I do not understand why, since according to Oracle the dot is not a special character that has to be escaped.

How do I have to handle this situation?


Solution

  • This is because your default lexer is treating the period as a word separator.

    Initial setup:

    create table my_table(item_number varchar2(50 byte) not null);
    
    insert into my_table values ('1234.1234');
    
    create index my_index on my_table (item_number) 
    indextype is ctxsys.context;
    

    This gets the behaviour you see:

    SELECT * FROM MY_TABLE
    WHERE CONTAINS(ITEM_NUMBER, '%1234') > 0;
    
    --------------------------------------------------
    1234.1234
    
    SELECT * FROM MY_TABLE
    WHERE CONTAINS(ITEM_NUMBER, '%.1234') > 0;
    
    no rows selected
    

    If you add a lexer that defines PRINTJOINS to include the period:

    drop index my_index;
    
    begin 
      ctx_ddl.create_preference('my_lexer', 'BASIC_LEXER'); 
      ctx_ddl.set_attribute('my_lexer', 'PRINTJOINS', '.');
    end;
    /
    
    create index my_index on my_table (item_number) 
    indextype is ctxsys.context
    parameters ('lexer my_lexer');
    

    then it behaves the way you want:

    SELECT * FROM MY_TABLE
    WHERE CONTAINS(ITEM_NUMBER, '%.1234') > 0;
    
    ITEM_NUMBER
    --------------------------------------------------
    1234.1234
    

    Read more about text indexing elements.