I want to implement an information retrieval system which uses vector space model, but with multi-term tokens and a custom term weighting function.
I am considering building my inverted index in PostgreSQL instead of file system. I read about GIN index which build such an index on a tsvector column.
Can I build tsvector values manually without calling to_tsvector function so that I can build my "custom" vector with custom tokens and custom weights ?
You can make tsvectors by hand. But as far as I know you can only assign 4 different weights, A, B, C, or D. Multi-word tokens will have to be put in single-quotes in order to keep them together as one token.
select $$'two words':1c oneword$$::tsvector;
tsvector
--------------------------
'oneword' 'two words':1C