performancefirebirdclustered-indexdefragmentationmvcc

does Firebird defrag? If so, like a clustered index?


I've seen a few (literally, only a few) links and nothing in the documentation that talks about clustering with Firebird, that it can be done.

Then, I shot for the moon on this question CLUSTER command for Firebird?, but answerer told me that Firebird doesn't even have clustered indexes at all, so now I'm really confused.

Does Firebird physically order data at all? If so, can it be ordered by any key, not just primary, and can the clustering/defragging be turned on and off so that it only does it during downtime?

If not, isn't this a hit to performance since it will take the disk longer to put together disparate rows that naturally should be right next to each other?

(DB noob)

MVCC

I found out that Firebird is based upon MVCC, so old data actually isn't overwritten until a "sweep". I like that a lot!

Again, I can't find much, but it seems like a real shame that data wouldn't be defragged according to a key.

This says that database pages are defragmented but provides no further explanation.


Solution

  • Firebird does not cluster records. It was designed to avoid the problems that require clustering and the fragmentation problems that come with clustered indexes. Indexes and data are stored separately, on different types of pages. Each data page contains data from only one table. Records are stored in the order they were inserted, give or take concurrent inserts, which generally go on separate pages. When old records are removed, new records will be stored in their place, so new records sometimes appear on the same page as older ones.

    Many tables use an artificial primary key, generally ascending, which might be a database generated sequence or a timestamp. That practice causes records to be stored in key order, but that order is by no means guaranteed. Nor is it very interesting. When the primary key is artificial, most queries that return groups of related records are done on secondary indexes. That's a performance hit for records that are clustered because look-ups on secondary indexes require traversing two indexes because the secondary index provides only the key to the primary index, which must be traversed to find the data.

    On the larger issue of defragmentation and space usage, Firebird tracks the free space on pages so new records will be inserted on pages that have had records removed. If a page becomes completely empty, it will be reallocated. This space management is done as the database runs. As you know, Firebird uses Multi-Version Concurrency Control, so when a record is updated or deleted, Firebird creates a new record version, but keeps the old version around. When all transactions that were running before the change was committed have ended, the old record version no longer serves any purposes, and Firebird will remove it. In many applications, old versions are removed in the normal course of running the database. When a transaction touches a record with old versions, Firebird checks the state of the old versions and removes them if no running transaction can read them. There is a function called "Sweep" that systematically removes unneeded old record versions. Sweep can run concurrently with other database activity, though it's better to schedule it when the database load is low. So no, it's not true that nothing is removed until you run a sweep.

    Best regards,

    Ann Harrison

    who's worked with Firebird and it's predecessors for an embarassingly long time

    BTW - as the first person to answer mentioned, Firebird does leave space on pages so that the old version of a record stays on the same page as the newer version. It's not a fixed percentage of the space, but 16 bytes per record stored on the page, so pages of tables with very short records have more free space and tables that have long records have less.