Given that the size is variable, I am trying to calculate the average cost of a numeric(p, s)
attribute. In my case, numeric(10, 2)
.
From what I understand of the Postgres documentation, this means I have a total of 10 digits where 2 of them are exclusive of the decimal part, hence, the maximum value I can store is 99999999,99
.
What I do not understand is this part of the documentation:
The actual storage requirement is two bytes for each group of four decimal digits, plus three to eight bytes overhead.
My two questions here are:
I ran a query with an average of pg_column_size
that gave me a total of 6.95
bytes. However, I want to know if there is any way I can calculate the approximate value before checking the actual one. (Same with other variable types in Postgres)
If I have two decimal digits, that means the storage requirement for them would be one byte?
No, a numeric
with two digits occupies 5 bytes on disk and 8 bytes in RAM.
If I have three to eight bytes overhead, how can I know which value corresponds to my case?
As the manual says:
two bytes for each group of four decimal digits
For each started group of four decimal digits, to be precise. So we can compute:
ceil(digits / 4.0) * 2
plus three to eight bytes overhead.
The minimum overhead of 3 bytes only applies to '0' in "packed" format on disk. Else:
2 bytes overhead for any digits at all (except special case 0)
Or + 4 bytes overhead for more than 256 digits
3 bytes overhead in "packed" format, which is in use for storage on disk
Or + 6 bytes overhead when storage exceeds 128 bytes or for storage in RAM
Sign, decimal point, and white space of the number are insignificant for storage size.
Type modifiers (precision and scale) of the numeric
type are also insignificant.
Here is a demo with formulas for the complete computation:
SELECT id, digits
, pg_column_size(n) AS disk_size
, CASE
WHEN n = 0 THEN 3 -- efficient storage of 0
WHEN digits <= 248 THEN 3 + ceil(digits / 4.0) * 2 -- "packed" format for size <= 128 bytes
WHEN digits <= 256 THEN 6 + ceil(digits / 4.0) * 2 -- effecitve size > 128 bytes
ELSE 8 + ceil(digits / 4.0) * 2 -- more than 256 digits
END AS expected_disk_size
, pg_column_size(n + 0) AS ram_size
, CASE
WHEN n = 0 THEN 6 -- efficient storage of 0
WHEN digits <= 256 THEN 6 + ceil(digits / 4.0) * 2 -- effecitve size > 128 bytes
ELSE 8 + ceil(digits / 4.0) * 2 -- more than 256 digits
END AS expected_ram_size
FROM tbl
, length(translate(n::text, '-. ', '')) AS digits; -- only digits matter
Note that the no-op n + 0
forces the stored value to be unpacked, so we get RAM size. Related:
Also note that storage of null values is different. See: