mysqlamazon-web-servicesamazon-rdsschema-migration

Ghost/copy table is many times bigger than original table


Info: I am using AWS RDS Mysql 5.6.34 (500GB) instance (Without replica, just the master)

Note: Binlog is enabled & set to Row

Target: Modify a column field_1 from enum to tinyint

Extra info: I am using Rails application. So everytime I wanted to add a value to the enum, I need to write a migration. So converting the enum field to tinyint so I can add or delete a enum value without writing a migration using Active Enum

Other info: I also tried LHM but the RDS instance went out of memory at 93%

Database info before running gh-ost:

mysql> select table_schema, sum((data_length+index_length)/1024/1024) AS MB from information_schema.tables group by 1;
+--------------------+-----------------+
| table_schema       | MB              |
+--------------------+-----------------+
| information_schema |      0.00976560 |
| mysql              |      5.96277428 |
| performance_schema |      0.00000000 |
| my_app_db          | 223941.79882818 |
+--------------------+-----------------+

Size of the original table before gh-ost: (showing only the table needed to be modified from the list)

mysql> SELECT table_name AS `Table`, round(((data_length + index_length) / 1024 / 1024), 2) `Size (MB)` FROM information_schema.TABLES WHERE table_schema = "my_app_db";
+----------------------------------------+-----------+
| Table                                  | Size (MB) |
+----------------------------------------+-----------+
| table_abc                              |     70.41 |
| my_table                               |  86058.73 |

Started the migration:

gh-ost \
--user="user" \
--password="password" \
--host="my-endpoint.rds.amazonaws.com" \
--database="my_app_db" \
--table="my_table" \
--alter="MODIFY field_1 TINYINT(2) DEFAULT 1 NOT NULL" \
--assume-rbr \
--allow-on-master \
--verbose \
--execute

When the migration was nearly 93% complete, the RDS free-memory went down to 20GB. So I stopped gh-ost.

Database info after stopping gh-ost:

mysql> select table_schema, sum((data_length+index_length)/1024/1024) AS MB from information_schema.tables group by 1;
+--------------------+-----------------+
| table_schema       | MB              |
+--------------------+-----------------+
| information_schema |      0.00976560 |
| mysql              |      5.96277428 |
| performance_schema |      0.00000000 |
| my_app_db          | 446299.17968756 |
+--------------------+-----------------+

Size of the original table after stopping gh-ost:

mysql> SELECT table_name AS `Table`, round(((data_length + index_length) / 1024 / 1024), 2) `Size (MB)` FROM information_schema.TABLES WHERE table_schema = "my_app_db";
+----------------------------------------+-----------+
| Table                                  | Size (MB) |
+----------------------------------------+-----------+
| _my_table_ghc                       |      0.41 |
| _my_table_gho                       | 273157.00 |
| my_table                            |  85011.62 |

Question:

Why is the gh-ost table many times larger than the original table?

If more info is required about the table, index or database, I can provide :)

This is the link to the issue created in gh-ost repo: https://github.com/github/gh-ost/issues/890


Solution

  • I created the mysql DB from a backup of production db.

    Production had innodb_file_format parameter as Barracuda

    The new environment had innodb_file_format parameter as Antelope

    The ROW_FORMAT for the table in production was COMPRESSED

    Unfortunately Antelope db doesn't support ROW_FORMAT as COMPRESSED

    If I had looked more into the details of information_schema, I could have resolved it eatlier!