full-text-searchreserved-wordsstop-wordsoracle-xeoracle-ucm

Where can I find a list of 'Stop' words for Oracle fulltext search?


I've a client testing the full text (example below) search on a new Oracle UCM site. The random text string they chose to test was 'test only'. Which failed; from my testing it seems 'only' is a reserved word, as it is never returned from a full text search (it is returned from metadata searches).

I've spent the morning searching oracle.com and found this which seems pretty comprehensive, yet does not have 'only'.

So my question is thus, is 'only' a reserved word. Where can I find a complete list of reserved words for Oracle full text search (10g)?

Full text search string example;

(<ftx>test only</ftx>)


Update. I have done some more testing. Seems it ignores words that indicate places or times; only, some, until, when, while, where, there, here, near, that, who, about, this, them.

Can anyone confirm this? I can't find this in on Oracle anywhere.


Update 2. Post Answer I should have been looking for 'stop' words not 'reserved'. Updated the question title and tags to reflect.


Solution

  • I bet the system is trying to automatically ignore frequently occurring words. That would explain why you cannot find 'only' but 'onnly' can be found. Can you search for 'a', 'an', ...

    The list you gave of words that do not work looks like some very common words that frequently are not the primary words in a sentence. Given this, they are not likely to be words you are searching for on a full text search.

    What are the odds that you are looking for an article that includes the word 'that' and the inclusion of that word is the only fact you have on the article?

    I think I found your list.... Ironically from the wiki page of the last company I started..: http://www.sugarcrm.com/wiki/index.php?title=Overview_of_Full_Text_Stop_Words#Default_Stop_Words_.28for_English.29

    2.10.3 Modifying the Default Stoplist The default stoplist is always named CTXSYS.DEFAULT_STOPLIST. You can use the following procedures to modify this stoplist:
     • CTX_DDL.ADD_STOPWORD
     • CTX_DDL.REMOVE_STOPWORD
     • CTX_DDL.ADD_STOPTHEME
     • CTX_DDL.ADD_STOPCLASS
     When you modify CTXSYS.DEFAULT_STOPLIST with the CTX_DDL package, you must re-create your index for the changes to take effect.
    

    Default stopword list:

    a he out up
    be more their at
    had one  will  from
    it than and is
    only when corp not
    she also in  says
    was by ms to
    about her  over  
    because  most  there  
    has or  with  
    its that are  
    of which could  
    some an inc  
    we can mz  
    after  his s  
    been mr they  
    have other  would  
    last the as  
    on who for  
    such any into  
    were co  no  
    all if so  
    but mrs this
    

    Update - A nice whitepaper from Oracle that includes how full text searching works can be downloaded from: http://www.oracle.com/technology/products/text/pdf/text_techwp.pdf. They mention the stopwords and the fact that there is a default list, but don't mention the words themselves.