I'm looking for the most secure way to preserve my database data in a .sql backup.
This:
mysqldump -u root -p DBName > backupName.sql
outputs also these lines for my database:
DROP TABLE IF EXISTS `tableName`;
/*!40101 SET @saved_cs_client = @@character_set_client */;
/*!50503 SET character_set_client = utf8mb4 */;
CREATE TABLE `tableName` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`col1` int(11) unsigned NOT NULL,
`col2` int(11) unsigned NOT NULL,
...
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci;
/*!40101 SET character_set_client = @saved_cs_client */;
How does this line work about the encoding?
/*!50503 SET character_set_client = utf8mb4 */;
I remember that those data were saved with some utf8
encoding but not with utf8mb4
, maybe utf8mb4
can handle correctly all sub-set like utf8
and utf8_general_ci
and utf8_unicode_ci
?
(I'm using Ubuntu with MySQL 8)
Yes, utf8mb4
is a superset of utf8
.
utf8
supports only the Basic Multilingual Plane of the UTF-8 standard — i.e. 1-byte, 2-byte, and 3-byte code points.
utf8mb4
supports everything utf8
does, and in addition, supports the Supplemental Multilingual Plane of the UTF-8 standard.
As of MySQL 8.0.28, utf8
is now known as utf8mb3
. It has been documented that a future release of MySQL will repurpose the utf8
alias to the utf8mb4
character set.
The character_set_client
only describes the character set used by the client to encode character data it sends. This doesn't have to be the same as the character set used by each table, if there's a valid conversion path from the client character set into whatever is used by the respective table.
In other words, if you set the client character set to utf8mb4, and the table uses utf8 (a subset), it's fine as long as the client doesn't send 4-byte characters from the supplemental utf8 plane (this includes for example emojis).
utf8_general_ci
and utf8_unicode_ci
are not character sets, they are collations. This doesn't affect storage of strings at all, but it affects the sort order used as indexes are built, and it also affects character equivalence for unique constraints.