My goal is a sort of wiki-like notebook using SQLite as the data store. The idea is that there is one big pages table where every revision of every page gets stored. Every page has only three properties:
(It will be very rare that a page is deleted, but I'd like to allow for it, if possible.)
I only care about indexing and searching the most recent revision of each page, so only the most recent revision of each page should be in the FTS5 index. I'm attempting to use TRIGGERs to manage the index. And yet, I'm getting results for all revisions, or previous revisions.
Test case:
-- The pages table
CREATE TABLE IF NOT EXISTS pages (
id INTEGER PRIMARY KEY,
revision TEXT NOT NULL,
title TEXT NOT NULL,
body TEXT
);
-- The FTS5 table
CREATE virtual TABLE pages_fts USING FTS5(
id,
title,
body,
content='pages',
content_rowid=id
);
-- Before a page is inserted into the pages table,
-- delete the old matching entry in the FTS table
CREATE TRIGGER pages_before_insert BEFORE INSERT ON pages
BEGIN
INSERT INTO pages_fts (pages_fts, id, title, body)
SELECT 'delete', id, title, body
FROM pages
WHERE title = new.title
ORDER BY revision
DESC
LIMIT 1;
END;
-- After a page is inserted into the pages table,
-- insert a matching entry into the FTS table
CREATE TRIGGER pages_after_insert AFTER INSERT ON pages
BEGIN
INSERT INTO pages_fts (id, title, body)
VALUES (new.id, new.title, new.body);
END;
-- After a page is deleted from the pages table,
-- delete the matching entry from the FTS table
CREATE TRIGGER pages_after_delete AFTER DELETE ON pages
BEGIN
INSERT INTO pages_fts (pages_fts, id, title, body)
VALUES ('delete', old.id, old.title, old.body);
END;
INSERT INTO pages (revision, title, body) VALUES ('2023-04-26T22:48:35.582797', 'home', 'body version one');
INSERT INTO pages (revision, title, body) VALUES ('2023-04-26T22:48:40.250981', 'home', 'body version two');
INSERT INTO pages (revision, title, body) VALUES ('2023-04-26T22:48:46.205782', 'home', 'body version one');
-- should return no errors
INSERT INTO pages_fts(pages_fts, rank) VALUES ('integrity-check', 1);
-- title search
-- should return only one result
SELECT title, snippet(pages_fts, 1, '>', '<', '...', 10)
AS snippet
FROM pages_fts
WHERE pages_fts
MATCH 'title:home'
ORDER BY rank
LIMIT 50;
-- body search
-- should only return one result
SELECT title, snippet(pages_fts, 2, '>', '<', '...', 64)
AS snippet
FROM pages_fts
WHERE pages_fts
MATCH 'body:version'
ORDER BY rank
LIMIT 50;
Each of the last two searches should only show one result each, but I'm getting multiple results as shown below:
Output:
home|>home<
home|>home<
home|>home<
home|body >version< one
home|body >version< two
home|body >version< one
I had previously posted this question under the wrong account and have since deleted it. SO still thinks it is a duplicate, so I have to add this text in order to post the question.
It turns out that when linking an index to external content (with content=<table_name>
), you cannot pick and choose which rows of the content table you want to index. With an external content index, you must always keep the index and the external table in sync, or the result is a corrupted database.
To do this, just omit the content
and content_rowid
columns from the FTS5 table definitions. This allows you to build the search index however you like, using regular SQL statements. (The SQLite docs spend quite a lot of time talking about external content and contentless mode but do not name or even seem to define this mode of operation, I just call it "normal" mode.)
CREATE TABLE IF NOT EXISTS pages (
revision TEXT NOT NULL,
title TEXT NOT NULL,
body TEXT
);
CREATE VIRTUAL TABLE pages_fts USING FTS5(
title,
body
);
CREATE TRIGGER pages_after_insert AFTER INSERT ON pages
BEGIN
DELETE FROM pages_fts WHERE title = new.title;
INSERT INTO pages_fts (title, body)
VALUES (new.title, new.body);
END;