With the above changes we are able to save the emoji in our columns.
Question: 1) Do I need to delete existing indexes (varchar columns) and recreate the indexes as earlier with utf8 each character used to take 3 bytes and now with utf8mb4 encoding it will occupy 4 bytes ?
An index is an ordered list of pointers to the rows of the table. The ordering is based on both the CHARACTER SET
and COLLATION
of the column(s) of the index. If you change either, the index must be rebuilt. A "pointer" (in this context) is a copy of the PRIMARY KEY
.
You should do one or the other of
ALTER TABLE tbl CONVERT TO CHARACTER SET utf8mb4 COLLATE ...,;
which converts all the text columns in the table. Or, if you need to leave some with their current charset/collation, then change each column:
ALTER TABLE tbl MODIFY col_name ... CHARACTER SET utf8mb4 COLLATE ...;
where the first '...' is the rest of the column definition (VARCHAR, NOT NULL, whatever).
Any indexes that involve the columns being changed will be rebuilt. In particular, note that a VARCHAR
PRIMARY KEY
is effectively in each secondary index.
The collation utf8mb4_unicode_ci
is rather old; you might prefer utf8mb4_unicode_520_ci
, especially since it handles Emoji as being distinct rather than lumped together (IIRC).
The fact that utf8 is a subset of utf8mb4 is not relevant; MySQL sees it as a change and fails to take any short cuts.