mysqlencodingutf-8cp1251

How to change encoding on fly in SELECT statement?


I have a table with a column, which has cp1251_general_ci collation. I don't want to change column collation, but I want to get data in utf8 encoding.

Is there a way to select any data somehow in a way that it looks just like a data with utf8_general_ci collation?

I.e. I need something like this

SELECT CONVERT_TO_UTF8(weirdColumn) FROM weirdTable

Solution

  • Here's a demo table using the cp1251 encoding. I'll insert some Cyrillic characters into it.

    mysql> CREATE TABLE weirdTable (weirdColumn text) ENGINE=InnoDB DEFAULT CHARSET=cp1251;
    
    mysql> insert into weirdTable values ('ЂЃЉЌ');
    
    mysql> select * from weirdTable;
    +-------------+
    | weirdColumn |
    +-------------+
    | ЂЃЉЌ        |
    +-------------+
    

    Use MySQL's CONVERT() function to force the characters to a different encoding:

    mysql> select convert(weirdColumn using utf8) as weirdColumnUtf8 from weirdTable;
    +-----------------+
    | weirdColumnUtf8 |
    +-----------------+
    | ЂЃЉЌ            |
    +-----------------+
    

    Here's proof that the result has been converted to utf8. I create a table using metadata from the query result:

    mysql> create table w2 
        as select convert(weirdColumn using utf8) as weirdColumnUtf8 from weirdTable;
    Query OK, 1 row affected (0.07 sec)
    Records: 1  Duplicates: 0  Warnings: 0
    
    mysql> show create table w2\G
    *************************** 1. row ***************************
           Table: w2
    Create Table: CREATE TABLE `w2` (
      `weirdColumnUtf8` longtext CHARACTER SET utf8
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
    1 row in set (0.00 sec)
    
    mysql> select * from w2;
    +-----------------+
    | weirdColumnUtf8 |
    +-----------------+
    | ЂЃЉЌ            |
    +-----------------+
    

    On my MySQL instance, utf8mb4 is the default character encoding. That's okay; it's a superset of utf8, and the utf8 encoding is enough to store these characters. However, I generally recommend if you use utf8, there's no reason not to use utf8mb4.


    If you change the character encoding, you cannot keep the cp1251 collation. Collations are specific to encodings. But you can use one of the collations associated with utf8 or utf8mb4. You can see the available collations for a given character encoding:

    mysql> SHOW COLLATION WHERE Charset = 'utf8';
    +--------------------------+---------+-----+---------+----------+---------+---------------+
    | Collation                | Charset | Id  | Default | Compiled | Sortlen | Pad_attribute |
    +--------------------------+---------+-----+---------+----------+---------+---------------+
    ...
    | utf8_general_ci          | utf8    |  33 | Yes     | Yes      |       1 | PAD SPACE     |
    | utf8_general_mysql500_ci | utf8    | 223 |         | Yes      |       1 | PAD SPACE     |
    ...