selecth2defragmentation

H2 Optimize select statement / shutdown defrag


Test Case:

drop table master;
create table master(id int primary key, fk1 int, fk2 int, fk3 int, dataS  varchar(255),    data1 int, data2 int, data3 int, data4 int,data5  int,data6 int,data7 int,data8 int,data9 int,b1 boolean,b2 boolean,b3 boolean,b4 boolean,b5 boolean,b6 boolean,b7 boolean,b8 boolean,b9 boolean,b10 boolean,b11 boolean,b12 boolean,b13 boolean,b14 boolean,b15 boolean,b16 boolean,b17 boolean,b18 boolean,b19 boolean,b20 boolean,b21 boolean,b22 boolean,b23 boolean,b24 boolean,b25 boolean,b26 boolean,b27 boolean,b28 boolean,b29 boolean,b30 boolean,b31 boolean,b32 boolean,b33 boolean,b34 boolean,b35 boolean,b36 boolean,b37 boolean,b38 boolean,b39 boolean,b40 boolean,b41 boolean,b42 boolean,b43 boolean,b44 boolean,b45 boolean,b46 boolean,b47 boolean,b48 boolean,b49 boolean,b50 boolean);

create index idx_comp on master(fk1,fk2,fk3);
@loop 5000000 insert into master values(?, mod(?,100), mod(?,5), ?,'Hello World Hello World Hello World',?, ?, ?,?, ?, ?, ?, ?, ?,true,true,true,true,true,true,false,false,false,true,true,true,true,true,true,true,false,false,false,true,true,true,true,true,true,true,false,false,false,true,true,true,true,true,true,true,false,false,false,true,true,true,true,true,true,true,false,false,false,true);

1.The following select statement takes up to 30seconds. Is there a way to optimize the response time?

SELECT count(*), SUM(CONVERT(b1,INT)) ,SUM(CONVERT(b2,INT)),SUM(CONVERT(b3,INT)),SUM(CONVERT(b4,INT)),SUM(CONVERT(b5,INT)),SUM(CONVERT(b6,INT)),SUM(CONVERT(b7,INT)),SUM(CONVERT(b8,INT)),SUM(CONVERT(b9,INT)),SUM(CONVERT(b10,INT)),SUM(CONVERT(b11,INT)),SUM(CONVERT(b12,INT)),SUM(CONVERT(b13,INT)),SUM(CONVERT(b14,INT)),SUM(CONVERT(b15,INT)),SUM(CONVERT(b16,INT))
FROM  master
WHERE fk1=53 AND fk2=3

2.I tried shutdown defrag. But this statement took about 40min for my test case. After shutdown defrag the select takes up to 15seconds. If i execute the statement again it takes under 1sec. Even if stop and start the server, the statement takes about 1sec. Has H2 a persistent Cache?

Infrastructure: WebBrowser <-> H2 Console Server <-> H2 DB: h2 1.3.158


Solution

  • According to the profiler output, the main problem (93%) is reading from the disk. I ran this in the H2 Console:

    @prof_start;
    SELECT ... FROM  master WHERE fk1=53 AND fk2=3;
    @prof_stop;
    

    and got:

    Profiler: top 3 stack trace(s) of 48039 ms [build-158]:
    4084/4376 (93%):
    at java.io.RandomAccessFile.readBytes(Native Method)
    at java.io.RandomAccessFile.read(RandomAccessFile.java:338)
    at java.io.RandomAccessFile.readFully(RandomAccessFile.java:397)
    at org.h2.store.FileStore.readFully(FileStore.java:285)
    at org.h2.store.PageStore.readPage(PageStore.java:1253)
    at org.h2.store.PageStore.getPage(PageStore.java:707)
    at org.h2.index.PageDataIndex.getPage(PageDataIndex.java:225)
    at org.h2.index.PageDataNode.getRowWithKey(PageDataNode.java:269)
    at org.h2.index.PageDataNode.getRowWithKey(PageDataNode.java:270)
    

    According to EXPLAIN ANALYZE SELECT it's reading over 55'000 pages from the disk (2 KB each page; 110 MB) for this query. I'm not sure how other databases perform for such a query. But I guess if possible the query should be changed so that it reads less data.