I am working in the spatialite-gui with a sqlite db. I have a large table (80,000,000 records) and I want to make a table that includes a code from another table, which has about 48,000 records.
I wrote the following code and tested it on the first 80k records of my big table; it took about 1:25 to run. When I ran it on the big table it went for 100 hours and I ended up cancelling with no result.
What can I do to improve query performance? I have an index on longitude but not on latitude; neither field is unique. Both lut.pointgeom and lut.gridgeom are BLOB fields and not strictly necessary.
CREATE TABLE policy_gcode AS
SELECT
p.*,
lut.gcode,
lut.pointgeom,
lut.gridgeom
FROM
allpol AS p
JOIN policylutgrid AS lut
ON p.latitude = lut.latitude
AND p.longitude = lut.longitude;
Before all, you want a sqlite multicolumn indice on each table with both latitude and longitude fields, ordered in the same way as you join them. This should allow your database engine to optimize the JOIN operation that occurs within your query.
CREATE INDEX allpol_idx ON allpol(latitude, longitude);
CREATE INDEX policylutgrid_idx ON policylutgrid(latitude, longitude);
It is also a good idea to only SELECT the fields that you really need : if you don’t need the BLOBs (or any other field), don’t ask for it. This will reduce the amount of data that your database must handle.
If theses measures do not solve the issue, then you have to look at other parts of the picture :