I'm using Oracle Text on Oracle 10g. I have created a multicolumn CONTEXT index as below:
My table is
CREATE TABLE WEB_RES
(
"ID" NUMBER(10,0),
"TITLE" VARCHAR2(256 BYTE),
"DESCRIPTION" VARCHAR2(1024 BYTE),
"CONTENT" BLOB,
"CATEGORY" VARCHAR2(64 BYTE),
...
CONSTRAINT "PK_WEB_RES" PRIMARY KEY ("ID")
}
My preference and index are:
execute ctx_ddl.create_preference('my_multi', 'MULTI_COLUMN_DATASTORE');
execute ctx_ddl.set_attribute('my_multi', 'columns', 'title, description, category');
execute ctx_ddl.create_preference( 'my_lexer', 'BASIC_LEXER' );
execute ctx_ddl.set_attribute('my_lexer', 'index_stems', '1');
create index myTitleIndex on web_res(title)
indextype is ctxsys.context
parameters ('DATASTORE my_multi lexer my_lexer SYNC(ON COMMIT)');
I have a row in my table as following:
ID Title Description Category
--- ----------- -------------- ------------
1 Superannuation Contributions Splitting Test Test
English is not my native language but shouldn't this query return a result?
SELECT * FROM web_res WHERE CONTAINS(title, '$contribute', 1) > 0;
I expect to find a title with 'Contribution' when searching for '$Contribute' or find a title with 'approval' when searching for '$approve'. Isn't this what stemming is about?
Edit: I also tried the Basic_Wordlist as below, but still got no rows back for my query:
execute ctx_ddl.create_preference( 'my_wordlist', 'BASIC_WORDLIST' );
execute ctx_ddl.set_attribute('my_wordlist', 'stemmer', 'ENGLISH');
create index myTitleIndex on web_res(title)
indextype is ctxsys.context
parameters ('DATASTORE my_multi wordlist my_wordlist SYNC(ON COMMIT)');
can you please try to create a BASIC_WORDLIST with a stemmer preference set to english?
http://docs.oracle.com/cd/B19306_01/text.102/b14218/cdatadic.htm#i1008929
OK, I have researched it in more depth for you, and I don't think you'll like he reuslts... I used the following schema:
execute ctxsys.ctx_ddl.create_preference('my_multi','MULTI_COLUMN_DATASTORE');
execute ctxsys.ctx_ddl.set_attribute('my_multi','columns','a,b,c');
execute ctxsys.ctx_ddl.create_preference('my_lexer','BASIC_LEXER');
execute ctxsys.ctx_ddl.set_attribute('my_lexer','index_stems','1');
execute ctxsys.ctx_ddl.create_preference('my_wordlist','BASIC_WORDLIST');
execute ctxsys.ctx_ddl.set_attribute('my_wordlist','stemmer','ENGLISH');
create index tt on test(a) indextype is ctxsys.context parameters('DATASTORE my_multi WORDLIST my_wordlist LEXER my_lexer SYNC(on commit)');
insert into test values ('Superannuation Contributions Splitting','test','test');
insert into test values ('commit','Superannuation Contributions Splitting','test');
insert into test values ('commit','test','Superannuation Contributions Splitting');
commit;
So after creating the schema - I went to the DR$INDEX_NAME$I table, and looked at the data there. it contains : B,C,commit,contribution, contributions,split,splitting,superannuation,test. It means that these are the list of "roots" you can search for.
Since "contribute" is not here, you cannot "get to it".
But, if you'll search for '$contribution' you will find it, or even for "$committing" since the root of it, commit, is in the list.
Bottom line - Oracle does not know that contribute and contribution are related. This is either a bug in the engine or a linguistic property (you can never know).
BUT - I can assure you the stemming works - since the search of '$contribution' and '$split' will return the results...