sqlite

SQLite not using index for delete


Given a schema like

CREATE TABLE `EventStatus` 
(`id` varchar(1024) NOT NULL,
 `node` varchar(255) DEFAULT NULL,
 `lastUpdate` bigint DEFAULT NULL,
 `deleted` integer DEFAULT 0, 
 `owner` varchar(1024) DEFAULT '', 
 `body` varchar(10240) DEFAULT NULL, 
 PRIMARY KEY (`id`, `node`) );

CREATE INDEX EventStatus_lastUpdate 
 on EventStatus (deleted, lastUpdate);

SQLite3 will use the index to count rows where deleted, and delete rows where deleted = 1, but not delete rows where deleted. Why is this? Using indexed by gives the "No query solution" error.

Using v3.31.1 CLI.

sqlite> delete from eventstatus  where deleted='1';
QUERY PLAN
`--SEARCH TABLE eventstatus USING INDEX EventStatus_lastUpdate (deleted=?)
sqlite> select count() from eventupdates  where deleted;
QUERY PLAN
`--SCAN TABLE eventupdates USING COVERING INDEX EventUpdates_lastUpdate
0
sqlite> delete from eventupdates  where deleted;
QUERY PLAN
`--SCAN TABLE eventupdates

Solution

  • Using indexed by gives the "No query solution" error.

    This is a clue. In short the specified index cannot be used (if forced).

    This due to:-

    To be usable by an index a term must usually be of one of the following forms:

      column = expression
      column IS expression
      column > expression
      column >= expression
      column < expression
      column <= expression
      expression = column
      expression IS column
      expression > column
      expression >= column
      expression < column
      expression <= column
      column IN (expression-list)
      column IN (subquery)
      column IS NULL
      column LIKE pattern
      column GLOB pattern
    

    i.e. deleted is an inequality as opposed to deleted=1 or deleted='1' to force the use of the index (using INDEXED BY ....) is then an error (as it would if the EXPLAIN QUERY PLAN were not used).

    In the absence of the INDEXED BY forcing the use of the index, then the query optimiser will decide how to locate the rows.

    I believe the count() function comes into play as

    EXPLAIN QUERY PLAN SELECT count() FROM eventstatus WHERE deleted:-
    

    results in SCAN TABLE eventstatus USING COVERING INDEX EventStatus_lastUpdate

    Whilst:-

    EXPLAIN QUERY PLAN SELECT * FROM eventstatus /*INDEXED BY eventstatus_lastupdate <<<<< OUCH*/ WHERE deleted;
    

    results in SCAN TABLE eventstatus

    You may wish to have a read of:-

    In addition to the previously linked document.

    You may also wish to consider using ANALYZE (if you feel that after reading the suggested links that you want the deleted column to be the initial column of an index) according to the following snippets:-

    Avoid creating low-quality indexes.. A low-quality index (for the purpose of this checklist) is one where there are more than 10 or 20 rows in the table that have the same value for the left-most column of the index. In particular, avoid using boolean or "enum" columns as the left-most columns of your indexes.

    and

    If you must use a low-quality index, be sure to run ANALYZE. Low-quality indexes will not confuse the query planner as long as the query planner knows that the indexes are of low quality. And the way the query planner knows this is by the content of the SQLITE_STAT1 table, which is computed by the ANALYZE command.

    both from section 5 of the last linked document.

    Some considerations re deleted v deleted=1 or deleted='1'

    IF the deleted column can only contain 0 (not deleted) or 1 (deleted) then the index would fit into the low quality basket so see above.

    OTHERWISE

    What if the value in deleted can be other than 0 or 1?

    Demo

    Consider the following:-

    DROP TABLE IF EXISTS `EventStatus`;
    CREATE TABLE IF NOT EXISTS `EventStatus` (`id` varchar(1024) NOT NULL,`node` varchar(255) DEFAULT NULL,`lastUpdate` bigint DEFAULT NULL,`deleted` integer DEFAULT 0, `owner` varchar(1024) DEFAULT '', `body` varchar(10240) DEFAULT NULL, PRIMARY KEY (`id`, `node`) );
    
    WITH
        cte_counter(i,r) AS (SELECT 1,abs(random() % 7) UNION ALL SELECT i+1, abs(random() % 7) FROM cte_counter LIMIT 1000000)
    INSERT OR IGNORE INTO eventstatus SELECT 
        'V'||i, /* ID column Vn WHERE n is 1 greater than the previous row */
        'node'||(abs(random()) % 30) /* NODE column is based upon a partially random set*/,
        abs(random()), /* LASTUPDATE column is just a positive random integer (as per SQLLite so 64bit signed) */
        (abs(random()) % 10), /* DELETED column is randomly 0-9*/
        'owner'||(abs(random()) % 200), /**/
        'body'||abs(random()) FROM cte_counter
    ;   
    
    CREATE INDEX EventStatus_lastUpdate on EventStatus (deleted, lastUpdate);
    
    SELECT 'a1',count(*) FROM eventstatus WHERE deleted;
    SELECT 'a2',count(*) FROM eventstatus WHERE deleted='1';
    SELECT 'a3',count(*) FROM eventstatus WHERE deleted=1;
    SELECT 'a4',count(*) FROM eventstatus WHERE deleted>=1;
    SELECT * FROM eventstatus WHERE deleted;
    SELECT * FROM eventstatus /*WHERE deleted to show ALL rows */;
    
    
    DROP TABLE IF EXISTS `EventStatus`;
    

    A million rows are inserted according to the comments.

    e.g. the rows will be like for example (last SELECT):-

    Last select output

    However:-

    1. the count (in this case) of the first SELECT is 899900 (approx 9/10ths that have 1,2,3 .... 9 aka true as opposed to 0 aka false)
    2. the count of the 2nd SELECT using, delete=1, is 99718 (not that far of 10%). Because only about 1 in 10 rows have 1 9/10ths will have either 0, 2,3 ... 9
    3. likewise delete='1' returns a count of 99718
    4. like 1 delete>=1 returns a count of 899900.

    Timings written to the message log, very much confirm when the index can be usefully/efficiently used:-

    SELECT 'a1',count(*) FROM eventstatus WHERE deleted
    > OK
    > Time: 0.048s
    
    
    SELECT 'a2',count(*) FROM eventstatus WHERE deleted='1'
    > OK
    > Time: 0.004s
    
    
    SELECT 'a3',count(*) FROM eventstatus WHERE deleted=1
    > OK
    > Time: 0.003s
    
    
    SELECT 'a4',count(*) FROM eventstatus WHERE deleted>=1
    > OK
    > Time: 0.022s
    
    
    SELECT * FROM eventstatus WHERE deleted
    > OK
    > Time: 1.624s
    
    
    SELECT * FROM eventstatus /*WHERE deleted*/
    > OK
    > Time: 1.833s
    

    NOTE! using

    EXPLAIN QUERY PLAN SELECT * FROM eventstatus WHERE deleted>=1;
    

    As perhaps expected from the timings, (0.022) as opposed to (0.048 for WHERE deleted), uses the index as >= is not an inequality as per:-

    SEARCH TABLE eventstatus USING INDEX EventStatus_lastUpdate (deleted>?)