I'm testing a proprietary tool that dumps a table in a MySQL RDS to the parquet format, and then restores it into another MySQL RDS.
Both tables have the same amount of rows:
mysql> SELECT COUNT(*) FROM fox_owners;
+----------+
| COUNT(*) |
+----------+
| 118950 |
+----------+
The table itself is configured the same way in both cases:
mysql> SHOW CREATE TABLE fox_owners;
+------------+-------------------------------------------------------+
| Table | Create Table |
+------------+-------------------------------------------------------+
| fox_owners | CREATE TABLE `fox_owners` (
`name` mediumtext,
`owner_id` bigint NOT NULL,
PRIMARY KEY (`owner_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+------------+-------------------------------------------------------+
So far so good, right?
However, the table sizes are different. The original:
+----------+----------------------+------------+
| Database | Table | Size in MB |
+----------+----------------------+------------+
| stam_db | fox_owners | 5582.52 |
The restored one:
+----------+----------------------+------------+
| Database | Table | Size in MB |
+----------+----------------------+------------+
| stam_db | fox_owners | 5584.52 |
The restored one is 2MB bigger! However, what's really bugging me, is the change in cardinality of the indexes between the 2 tables. Original:
mysql> SHOW INDEX FROM fox_owners;
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| fox_owners | 0 | PRIMARY | 1 | owner_id | A | 118728 | NULL | NULL | | BTREE | | | YES | NULL |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
Restored:
mysql> SHOW INDEX FROM fox_owners;
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| fox_owners | 0 | PRIMARY | 1 | owner_id | A | 117518 | NULL | NULL | | BTREE | | | YES | NULL |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
Why would cardinality drop from 118728 to 117518? If the restored table is less unique than the original - isn't this a clear sign that this table is different? How can I verify that these 2 tables in separate RDS databases have identical content?
And shouldn't the cardinality be 118950 in both of them anyway, since for a table with a single primary key column, the cardinality must be equal to the number of rows in the table?
I've ran ANALYZE TABLE on both tables, the values didn't change.
No Problem.
The "cardinality" is determined by making a small number of 'random' probes into the table. This leads to estimates. Sometimes the estimates are off by a factor of two or even more. 118728 and 117518 are unusually close to each other.
When loading/copying/altering a table, the BTrees are rebuilt. This leads to a likely variation of how the blocks of the BTree are laid out. So, it is normal to see the size (on disk) of a table change. A change of a factor of 2 is rare for this.