sqlitepeeweefts5

How do I use the trigram tokenizer/similarity option with Peewee and SQLite's FTS5?


This question concerns how to use FTS5's trigram tokenizer with Peewee.

  1. The official FTS5 documentation for SQLite cites support for trigram tokenization/similarity:

     > The experimental trigram tokenizer extends FTS5 to 
     > support substring matching in general, instead of the 
     > usual token matching. When using the trigram tokenizer
     > , a query or phrase token may match any sequence of 
     > characters within a row, not just a complete token.
     > 
     > CREATE VIRTUAL TABLE tri USING fts5(a, tokenize="trigram");
     > INSERT INTO tri VALUES('abcdefghij KLMNOPQRST uvwxyz');
    
  2. I've tried setting up an FTS based class with Peewee. I changed the options to use the trigram tokenizer:

     class Meta:
         db_table = 'fts_test_db'
         database = test_db
         options = {'tokenize': 'trigram', 'content': PrecedentPW}
    
  3. When I attempt to create a table with those options, this error flips up:

     _db.create_tables([_fts], )
    
     >> peewee.OperationalError: no such tokenizer: trigram
    
  4. But if I change the tokenizer options to use something else (e.g. 'porter'), no errors are raised.

How can I use the trigram tokenizer with Peewee?


Solution

  • You may need to compile the tokenizer yourself or ensure you are running a new enough version. The trigram tokenizer was not included by default until 3.34.0 of Sqlite: https://www.sqlite.org/releaselog/3_34_0.html