mysqldatabasecharacter-encodingcollationcharacter-set

Will existing indexes be affected when changing character_set and collation of MySQL db


  1. We have an database where the default character set for tables and columns is set to utf8 encoding
  2. But with character set encoding of utf8 , we are unable to save emojis
  3. To support saving of emojis, a) We had to change the character set of table and columns to utf8mb4 b) We had to change the collation of table and columns to utf8mb4_unicode_ci c) Update our JDBC driver so that it supports the unicode encoding

With the above changes we are able to save the emoji in our columns.

Question: 1) Do I need to delete existing indexes (varchar columns) and recreate the indexes as earlier with utf8 each character used to take 3 bytes and now with utf8mb4 encoding it will occupy 4 bytes ?


Solution

  • An index is an ordered list of pointers to the rows of the table. The ordering is based on both the CHARACTER SET and COLLATION of the column(s) of the index. If you change either, the index must be rebuilt. A "pointer" (in this context) is a copy of the PRIMARY KEY.

    You should do one or the other of

    ALTER TABLE tbl CONVERT TO CHARACTER SET utf8mb4 COLLATE ...,;
    

    which converts all the text columns in the table. Or, if you need to leave some with their current charset/collation, then change each column:

    ALTER TABLE tbl MODIFY col_name ... CHARACTER SET utf8mb4 COLLATE ...;
    

    where the first '...' is the rest of the column definition (VARCHAR, NOT NULL, whatever).

    Any indexes that involve the columns being changed will be rebuilt. In particular, note that a VARCHAR PRIMARY KEY is effectively in each secondary index.

    The collation utf8mb4_unicode_ci is rather old; you might prefer utf8mb4_unicode_520_ci, especially since it handles Emoji as being distinct rather than lumped together (IIRC).

    The fact that utf8 is a subset of utf8mb4 is not relevant; MySQL sees it as a change and fails to take any short cuts.