databasesystem-design

What happens when B-trees database have data larger than its page size?


In the book Designing Data Intensive Applications, it says:

By contrast, B-trees break the database down into fixed-size blocks or pages, traditionally 4 KB in size (sometimes bigger), and read or write one page at a time.

If the size of each page is fixed, what does the database do if we want to insert data more than its page size? Does it maintain another index to track how many pages each data spans?


Solution

  • For SQL Server it will basically contain a pointer to the "overflow" data. From Large row support section of the docs:

    Rows can't span pages; however, portions of the row can be moved off the row's page, so the row can be very large. The maximum amount of data and overhead that is contained in a single row on a page is 8,060 bytes. This doesn't include the data stored in the text/image page type.

    This restriction is relaxed for tables that contain varchar, nvarchar, varbinary, or sql_variant columns. When the total row size of all fixed and variable columns in a table exceeds the 8,060-byte limitation, SQL Server dynamically moves one or more variable length columns to pages in the ROW_OVERFLOW_DATA allocation unit, starting with the column with the largest width.

    This is done whenever an insert or update operation increases the total size of the row beyond the 8,060-byte limit. When a column is moved to a page in the ROW_OVERFLOW_DATA allocation unit, a 24-byte pointer on the original page in the IN_ROW_DATA allocation unit is maintained. If a subsequent operation reduces the row size, SQL Server dynamically moves the columns back to the original data page.

    Postgres is using the TOAST (The Oversized-Attribute Storage Technique), which combines compression and pointers as far as I understand:

    PostgreSQL uses a fixed page size (commonly 8 kB), and does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly. To overcome this limitation, large field values are compressed and/or broken up into multiple physical rows. This happens transparently to the user, with only small impact on most of the backend code. The technique is affectionately known as TOAST (or “the best thing since sliced bread”). The TOAST infrastructure is also used to improve handling of large data values in-memory.