I have a table in pg like so:
CREATE TABLE t (
a BIGSERIAL NOT NULL, -- 8 b
b SMALLINT, -- 2 b
c SMALLINT, -- 2 b
d REAL, -- 4 b
e REAL, -- 4 b
f REAL, -- 4 b
g INTEGER, -- 4 b
h REAL, -- 4 b
i REAL, -- 4 b
j SMALLINT, -- 2 b
k INTEGER, -- 4 b
l INTEGER, -- 4 b
m REAL, -- 4 b
CONSTRAINT a_pkey PRIMARY KEY (a)
);
The above adds up to 50 bytes per row. My experience is that I need another 40% to 50% for system overhead, without even any user-created indexes to the above. So, about 75 bytes per row. I will have many, many rows in the table, potentially upward of 145 billion rows, so the table is going to be pushing 13-14 terabytes. What tricks, if any, could I use to compact this table? My possible ideas below ...
Convert the real
values to integer
. If they can stored as smallint
, that is a saving of 2 bytes per field.
Convert the columns b .. m into an array. I don't need to search on those columns, but I do need to be able to return one column's value at a time. So, if I need column g, I could do something like
SELECT a, arr[5] FROM t;
Would I save space with the array option? Would there be a speed penalty?
Any other ideas?
I see nothing to gain (and something to lose) in storing several numeric fields in an array.
The size of each numerical type is clearly documented, you should simply use the smallest sized type compatible with your desired range-resolution; and that's about all you can do.
I don't think (but I'm not sure) if there is some byte alignment requirement for the columns along a row, in that case a reordering of the columns could alter the space used - but I don't think so.
BTW, there is a fix overhead per row, about 23 bytes.