I have a requirement to search multiple columns in database and I found this oracle text functionality. After reading all the documentation, I did everything right and it works well too. But I think this functionality is also indexing the actual name of column as well. Here is how you can reproduce the same...
--Creating a sample table
CREATE TABLE SAMPLE_TABLE
(
EMP_NO VARCHAR2(100)
,EMP_NAME VARCHAR2(100)
,EMP_DESIGNATION VARCHAR2(1000)
);
Once table is created, we insert some sample data
INSERT INTO SAMPLE_TABLE VALUES ('1','John Doe','Consultant');
INSERT INTO SAMPLE_TABLE VALUES ('2','Jane Doe','Contractor');
INSERT INTO SAMPLE_TABLE VALUES ('3','John Smith','Manager');
COMMIT;
After this, we will create the datastore so that I can simultaneously search multiple columns in single query..
begin
ctx_ddl.drop_preference ( 'MY_DATASTORE' );
ctx_ddl.create_preference('MY_DATASTORE','MULTI_COLUMN_DATASTORE');
ctx_ddl.set_attribute('MY_DATASTORE', 'columns', 'EMP_NO, EMP_NAME, EMP_DESIGNATION');
end;
Create index based on above datastore
CREATE INDEX SAMPLE_TABLE_IDX1 ON SAMPLE_TABLE (EMP_NO)
INDEXTYPE IS ctxsys.context PARAMETERS ('datastore MY_DATASTORE SYNC (ON COMMIT)');
After this, I tried to run a sample query to see if index is working (checked using explain plan)..
SELECT *
FROM sample_table
WHERE 1=1
AND contains(emp_no,'%Do%') > 0;
This works fine and as expected, I get 2 records with emp_name as "John Doe" and "Jane Doe". All good till here.
Now I tried this...
SELECT *
FROM sample_table
WHERE 1=1
AND contains(emp_no,'%emp%') > 0;
This gave me all the 3 records. This is happening because the word "emp" is at column name level. To verify my doubts, I just changed "emp" to "empx" as below..
SELECT *
FROM sample_table
WHERE 1=1
AND contains(emp_no,'%empx%') > 0;
This gives 0 records.
So, does this mean that the column names too are indexed? I checked the index tables created say "DR$SAMPLE_TABLE_IDX1$I" etc and it does have the column names mentioned inside it which means that is indexed too. I checked the data in the TOKEN_TEXT and it as below...
1
2
3
CONSULTANT
CONTRACTOR
DESIGNATION
DOE
EMP
JANE
JOHN
MANAGER
NAME
SMITH
How do i avoid indexing of the column names too?
Yes - this is expected behaviour. This is because (as you've identified) the column names are treated as part of the document text.
You can changed this by setting the delimiter
attribute of the data store to newline
:
CREATE TABLE SAMPLE_TABLE
(
EMP_NO VARCHAR2(100)
,EMP_NAME VARCHAR2(100)
,EMP_DESIGNATION VARCHAR2(1000)
);
INSERT INTO SAMPLE_TABLE VALUES ('1','John Doe','Consultant');
INSERT INTO SAMPLE_TABLE VALUES ('2','Jane Doe','Contractor');
INSERT INTO SAMPLE_TABLE VALUES ('3','John Smith','Manager');
COMMIT;
begin
ctx_ddl.drop_preference ( 'MY_DATASTORE' );
ctx_ddl.create_preference('MY_DATASTORE','MULTI_COLUMN_DATASTORE');
ctx_ddl.set_attribute('MY_DATASTORE', 'columns', 'EMP_NO, EMP_NAME, EMP_DESIGNATION');
-- add this to exclude column names
ctx_ddl.set_attribute('MY_DATASTORE','delimiter','NEWLINE');
end;
/
CREATE INDEX SAMPLE_TABLE_IDX1 ON SAMPLE_TABLE (EMP_NO)
INDEXTYPE IS ctxsys.context PARAMETERS ('datastore MY_DATASTORE SYNC (ON COMMIT)');
SELECT *
FROM sample_table
WHERE 1=1
AND contains(emp_no,'%Do%') > 0;
EMP_NO EMP_NAME EMP_DESIGNATION
1 John Doe Consultant
2 Jane Doe Contractor
SELECT *
FROM sample_table
WHERE 1=1
AND contains(emp_no,'%emp%') > 0;
no rows selected
See MOS note 1364654.1 for more details.