mysqlenumsalter

Mysql RDS alter table enum disrupted my database connectivity briefly


CONTEXT

We are on RDS mysql 8.0.

We have a high throughput write/read table. Yesterday we did issue an alter table statement to modify the enum value. We added one more value to list of enums at the end of the list. We did it with no lock and inplace algorithm. Now in past, we used to do it, and it used to work perfectly fine, finish in mili seconds. But yesterday it took upwards of an hour. Table size where it was done was 42 gb. And when this operation finished, I could see that we got a lot of Lock wait timeout exceeded; try restarting transaction'.

Now this hints me towards some kind of table locking happened towards the end stage of this operation briefly. I am not able to put my finger on it. What exactly happened here.

Investigation

This link talks about

Modifying the definition of an ENUM or SET column by adding new enumeration or set members to the end of the list of valid member values, as long as the storage size of the data type does not change. For example, adding a member to a SET column that has 8 members changes the required storage per value from 1 byte to 2 bytes; this requires a table copy. Adding members in the middle of the list causes renumbering of existing members, which requires a table copy.

This is what RDS Performance insight is showing for this transaction at that the time

enter image description here

Another wait snippets during this period from RDS Performance insights

enter image description here

Column collation is utf8_unicode_ci now based on above, I tried to see what are the storage requirements for the values in our ENUM values. It comes to this table

enter image description here

Now last row in the image above is the one that we had added. So I can see storage requirements did indeed change here. So we would need at least 42 bytes for this enum now, if I am comprehending it right. So as per mysql documentation, copy table might trigger, which might explain the behaviour.

Question Now my question is, if indeed above explanation is correct, then adding a value which is longer (in terms of character count in string value) then the last one (basically longer then longest of the current set) must reproduce this behaviour. But if I add one more character to this, then it's not reproducible. What am I missing ? How do I explain lock waiting ?


Solution

  • Modifying a column will also trigger a table copy if you change a NOT NULL column to NULL or vice-versa. I see in your alter command, you set the column to nullable. Was it NOT NULL before you altered it?

    Indeed the column was NOT NULL before, but it was made to NULL in this change!

    If you need to make an ALTER TABLE in production that will incur a table-copy on a large table, I recommend to use one of the open-source online schema change tools. This means the table copy runs in the background, and you can continue to read and write the table in the meantime.

    Of course, as with any new tool, experiment with it in a test environment first — not in production — until you're confident using it.

    At a previous job I had, we uses pt-online-schema-change to run hundreds of schema changes in production every week, without blocking any applications.