mysqlinnodbinformation-schema

Not able to calculate (DATA_LENGTH + INDEX_LENGTH) correctly in MySQL innoDB


I'm using MySQL innoDB for storing data. I needed to restrict the size of a particular table by some value. Soo after inserting data to the table, i will check the (DATA_LENGTH + INDEX_LENGTH) to see if it exceeded the limit. If exceeded i will delete some old data, until (DATA_LENGTH + INDEX_LENGTH) will reach below the limit.

But i'm getting unexpected (DATA_LENGTH + INDEX_LENGTH) value. so im not able to procced. (i have explained the situation below)

Initially i have a table tableTime with (DATA_LENGTH + INDEX_LENGTH) as around 15GB and DATA_FREE as 0GB. It has 2,000,000 rows.

DELETE FROM tableTime
LIMIT 1000000;

i deleted 1,000,000 rows (using above query) then waited for mysqld.exe to complete the disk I/O (manually see through task manager) and rebooted the system also.

ANALYZE TABLE tableTime;
SELECT ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024) AS TABLE_NAME  FROM information_schema.TABLES
where TABLE_NAME = 'tableTime';

now i checked (DATA_LENGTH + INDEX_LENGTH) again (using above query) i found it as ~13GB and DATA_FREE as ~2GB.

OPTIMIZE TABLE tableTime;
SELECT ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024) AS TABLE_NAME  FROM information_schema.TABLES
where TABLE_NAME = 'tableTime';

then i tried OPTIMIZE and got the expected like 8-9GB (DATA_LENGTH + INDEX_LENGTH).

But since tables are big i cant do OPTIMIZE every time to get table size. why ANALYZE is this much inefficient.


Solution

  • Data_length, index_length, and data_free are not measuring what you think they're measuring.

    InnoDB stores data and indexes on pages, which are 16KB by default and are uniform in size. Pages are grouped into extents of 1MB, i.e. 64 pages by default.

    When you insert or update data to a tablespace, InnoDB may expand the tablespace. Initial expansion is done by small amounts, after which expansion occurs 4MB at a time (4 extents).

    Each page stores at least 2 rows, and stores as many rows as fit in the page (long varchar/text/blob columns are stored on separate pages).

    As you update or delete rows, this might leave gaps of unused space in a given page. Further inserts and updates attempts to use these gaps if possible, but in practice they tend to become fragmented, with lots of small pockets of wasted space.

    Here's a thing you must understand: data_free measures only whole extents (64 pages) of unused space. This does not include the small pockets of wasted space. It doesn't even include empty pages, unless there are 64 contiguous empty pages. You might notice that data_free is always a multiple of 1MB.

    The data_length and index_length measure the total size of the pages that are at least partially filled, not the size of the data in them.

    This means that the actual size of data might be significantly less than what is reported by data_length + index_length.

    So how can you get a precise measure of the actual space occupied by data and indexes?

    There's no command to measure precisely the size of data in an InnoDB table.

    It's a difficult value to pin down anyway, because of transactions and multi-versioning. If you delete a row but you haven't committed that transaction yet, should you count the space? Likewise, if you've inserted a row but not committed, does that space count? What about multiple versions of a give row? What about rows that exist only in modified pages in the buffer pool, but they haven't been flushed to disk yet?

    Waiting for I/O to settle isn't reliable either, because InnoDB continues to do gradual updates, things like purging old row versions and merging the change buffer into secondary indexes. This can even continue after a reboot.

    Even if you could measure the size of data that precisely, the measurement would be outdated in a fraction of a second, because many databases have frequent updates.