I have a SQLite database with a table containing a billion rows and a temporary table used for row selection. The tables are created as follows:
CREATE TABLE [obj2](
[id] integer PRIMARY KEY ASC,
[name] text,
[height] integer,
[identification] text,
[width] integer
);
CREATE TEMPORARY TABLE [__tmp_selection] (id INTEGER PRIMARY KEY);
If I use the IN
clause with specific row IDs, SQLite efficiently uses the index on obj2.id
:
EXPLAIN QUERY PLAN
SELECT obj2.* FROM obj2 WHERE rowid IN (10,20,30,40,50);
id | parent | notused | detail |
---|---|---|---|
2 | 0 | 0 | SEARCH obj2 USING INTEGER PRIMARY KEY (rowid=?) |
This shows that SQLite properly uses the index on obj2.id
.
However, when I insert the same IDs into the temporary table and join it with obj2
, SQLite instead performs a full table scan on obj2
, which is inefficient:
INSERT INTO __tmp_selection(id) VALUES (10);
INSERT INTO __tmp_selection(id) VALUES (20);
INSERT INTO __tmp_selection(id) VALUES (30);
INSERT INTO __tmp_selection(id) VALUES (40);
INSERT INTO __tmp_selection(id) VALUES (50);
EXPLAIN QUERY PLAN
SELECT obj2.* FROM obj2
JOIN __tmp_selection ON obj2.id = __tmp_selection.id;
id | parent | notused | detail |
---|---|---|---|
3 | 0 | 0 | SCAN obj2 |
5 | 0 | 0 | SEARCH __tmp_selection USING INTEGER PRIMARY KEY (rowid=?) |
This suggests that SQLite is scanning the entire obj2
table, even though the join should be able to use the index on obj2.id
.
How can I force SQLite to scan __tmp_selection
(which contains only a few rows) instead of performing a full table scan on obj2
?
Reverse the order of the tables and use a CROSS JOIN
:
sqlite> EXPLAIN QUERY PLAN
...> SELECT obj2.* FROM __tmp_selection
...> CROSS JOIN obj2 ON obj2.id = __tmp_selection.id;
QUERY PLAN
|--SCAN __tmp_selection
`--SEARCH obj2 USING INTEGER PRIMARY KEY (rowid=?)
Using a CROSS JOIN
instead of JOIN
forces the query planner to always use that order of tables.
or, better option, run ANALYZE
on the tables to give the query planner more information about the costs of operations on them so it can pick a more efficient order itself; see also PRAGMA optimize
:
sqlite> ANALYZE __tmp_selection;
sqlite> ANALYZE obj2; -- assuming this table is populated
sqlite> EXPLAIN QUERY PLAN
...> SELECT obj2.* FROM obj2
...> JOIN __tmp_selection ON obj2.id = __tmp_selection.id;
QUERY PLAN
|--SCAN __tmp_selection
`--SEARCH obj2 USING INTEGER PRIMARY KEY (rowid=?)
(All testing done with Sqlite 3.49.1)