databasesqlite

How to force SQLite to use an index when joining a temporary table instead of scanning a large table?


I have a SQLite database with a table containing a billion rows and a temporary table used for row selection. The tables are created as follows:

CREATE TABLE [obj2](
  [id] integer PRIMARY KEY ASC,
  [name] text,
  [height] integer,
  [identification] text,
  [width] integer
);

CREATE TEMPORARY TABLE [__tmp_selection] (id INTEGER PRIMARY KEY);

If I use the IN clause with specific row IDs, SQLite efficiently uses the index on obj2.id:

EXPLAIN QUERY PLAN
SELECT obj2.* FROM obj2 WHERE rowid IN (10,20,30,40,50);

Query Plan Output:

id parent notused detail
2 0 0 SEARCH obj2 USING INTEGER PRIMARY KEY (rowid=?)

This shows that SQLite properly uses the index on obj2.id.


However, when I insert the same IDs into the temporary table and join it with obj2, SQLite instead performs a full table scan on obj2, which is inefficient:

INSERT INTO __tmp_selection(id) VALUES (10);
INSERT INTO __tmp_selection(id) VALUES (20);
INSERT INTO __tmp_selection(id) VALUES (30);
INSERT INTO __tmp_selection(id) VALUES (40);
INSERT INTO __tmp_selection(id) VALUES (50);

EXPLAIN QUERY PLAN
SELECT obj2.* FROM obj2
JOIN __tmp_selection ON obj2.id = __tmp_selection.id;

Query Plan Output:

id parent notused detail
3 0 0 SCAN obj2
5 0 0 SEARCH __tmp_selection USING INTEGER PRIMARY KEY (rowid=?)

This suggests that SQLite is scanning the entire obj2 table, even though the join should be able to use the index on obj2.id.


How can I force SQLite to scan __tmp_selection (which contains only a few rows) instead of performing a full table scan on obj2?


Solution

  • Reverse the order of the tables and use a CROSS JOIN:

    sqlite> EXPLAIN QUERY PLAN
       ...> SELECT obj2.* FROM __tmp_selection
       ...> CROSS JOIN obj2 ON obj2.id = __tmp_selection.id;
    QUERY PLAN
    |--SCAN __tmp_selection
    `--SEARCH obj2 USING INTEGER PRIMARY KEY (rowid=?)
    

    Using a CROSS JOIN instead of JOIN forces the query planner to always use that order of tables.

    or, better option, run ANALYZE on the tables to give the query planner more information about the costs of operations on them so it can pick a more efficient order itself; see also PRAGMA optimize:

    sqlite> ANALYZE __tmp_selection;
    sqlite> ANALYZE obj2; -- assuming this table is populated
    sqlite> EXPLAIN QUERY PLAN
       ...> SELECT obj2.* FROM obj2
       ...> JOIN __tmp_selection ON obj2.id = __tmp_selection.id;
    QUERY PLAN
    |--SCAN __tmp_selection
    `--SEARCH obj2 USING INTEGER PRIMARY KEY (rowid=?)
    

    (All testing done with Sqlite 3.49.1)