I have read many articles, discussions and tutorials about using utf-8 charset in mysql. Several approaches are introduced apparently for different cases (e.g. transfering to utf-8). What are the required appraoches for creating and using utf-8 mysql databases? The methods I am aware of:
CHARACTER SET utf8 DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT COLLATE utf8_general_ci when creating the databases.DEFAULT CHARSET=utf8 COLLATE utf8_general_ci when creating table.mysql_set_charset('utf8',$con); after every mysql connection.default-character-set = utf8, collation-server = utf8_unicode_ci, init-connect='SET NAMES utf8', character-set-server = utf8Are all of these actions needed to operate mysql database with utf-8 charset? If not, which is the best way(s)?
When you specify a character encoding like utf8 for a column it means that MySQL will use that encoding to store text. When you specify a default character encoding for a database or a table it means that their columns will have that encoding, unless you say otherwise. This affects the number of bytes the data will occupy on disk: in latin1 it's 1 byte per character, in sjis 2 bytes, in utf8 it varies. If you are storing a lot of text in Japanese you may want to use sjis instead of utf8.
When you specify a collation like utf8_general_ci for a column it means that MySQL will sort the data differently in ORDER BY or indices. Cultures have different rules of sorting text: for example in Swedish Ä is the second to last letter of the alphabet, while in English it's equivalent to A. So with Swedish collation you get a < b < ä and with English collation you get a = ä < b. Which collation you should use depends mostly on what your users expect to see.
And yet what the MySQL server does doesn't affect how the MySQL client returns the text: every connection has its own client encoding. The client converts the results to the connection encoding automatically so you don't really have to care about what the server does. In fact you can make a select with columns in different encodings and the MySQL client correctly converts everything to utf8 or whatever.
So what you really need to worry about is setting the connection encoding, which is what your #3 does. Probably SET NAMES in #4 also works but I wouldn't rely on it.