Notes
Good support from 4.1
utf-8 is utf8 in MySQL.
A collation defines the sort order for the data, it may be case sensitive or not
To find out your current setup:
SHOW VARIABLES LIKE 'character_set_database';
SHOW VARIABLES LIKE 'character_set_client';
To see available character sets and collations on your database:
SHOW CHARACTER SET;
SHOW COLLATION LIKE 'utf8%';
Character set and collation can be set per server, database, table, connection;
Server (/etc/my.cnf):
[mysqld]
...
default-character-set=utf8
default-collation=utf8_general_ci
Database:
(CREATE | ALTER) DATABASE ... DEFAULT CHARACTER SET utf8
Table:
(CREATE | ALTER) TABLE ... DEFAULT CHARACTER SET utf8
Connection:
SET NAMES 'utf8';
A PHP mysql connection (not totally confirmed, but see tests below) defaults to a latin1 connection, so, your first query after connection should be:
mysql_query("SET NAMES 'utf8'");
In php versions 5.2 and later, use
mysql_set_charset('utf8',$conn);
The CONVERT() function can convert between charsets, eg:
INSERT INTO utf8table (utf8column)
SELECT CONVERT(latin1field USING utf8)
FROM latin1table;
As mentioned in charsets, field widths may need to be increased to deal with multi-byte characters.
Code to generate a mass change of collations:
<?php
// this script will output the queries need to change all fields/tables to a different collation
// it is HIGHLY suggested you take a MySQL dump prior to running any of the generated
// this code is provided as is and without any warranty
die("Make a backup of your MySQL database then remove this line");
set_time_limit(0);
// collation you want to change:
$convert_from = 'latin1_swedish_ci';
// collation you want to change it to:
$convert_to = 'utf8_general_ci';
// character set of new collation:
$character_set= 'utf8';
$show_alter_table = true;
$show_alter_field = true;
// DB login information
$username = 'user';
$password = 'pass';
$database = 'table';
$host = 'localhost';
mysql_connect($host, $username, $password);
mysql_select_db($database);
$rs_tables = mysql_query(" SHOW TABLES ") or die(mysql_error());
print'<pre>';
while($row_tables = mysql_fetch_row($rs_tables)){
$table = mysql_real_escape_string($row_tables[0]);
// Alter table collation
// ALTER TABLE `account` DEFAULT CHARACTER SET utf8
if($show_alter_table){
echo("ALTER TABLE `$table` DEFAULT CHARACTER SET $character_set;\r\n");
}
$rs = mysql_query(" SHOW FULL FIELDS FROM `$table` ") or die(mysql_error());
while($row=mysql_fetch_assoc($rs)){
if($row['Collation']!=$convert_from)
continue;
// Is the field allowed to be null?
if($row['Null']=='YES'){
$nullable = ' NULL ';
}else{
$nullable = ' NOT NULL';
}
// Does the field default to null, a string, or nothing?
if($row['Default']=='NULL'){
$default = " DEFAULT NULL";
}elseif($row['Default']!=''){
$default = " DEFAULT '".mysql_real_escape_string($row['Default'])."'";
}else{
$default = '';
}
// Alter field collation:
// ALTER TABLE `account` CHANGE `email` `email` VARCHAR( 50 ) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL
if($show_alter_field){
$field = mysql_real_escape_string($row['Field']);
echo"ALTER TABLE `$table` CHANGE `$field` `$field` $row[Type] CHARACTER SET $character_set COLLATE $convert_to $nullable $default; \r\n";
}
}
}
?>
MySQL tables, columns and connections with UTF-8
What happens when you
INSERT
a UTF-8 string into a MySQL database using PHP‘s MySQL extension? Well that depends on what string you use. If you use something like “Iñtërnâtiônàlizætiøn” then doing just about anything in MySQL will be data-safe (but, of course, the collation will be incorrect). If you have characters that don’t encode the same in UTF-8 and latin1 (e.g. text in Chinese, Russian, ...) then the behaviour depends on both the table definition and the encoding of the connection to the database.Table or column charset | ||
---|---|---|
latin1 (MySQL default) | utf8 | |
default connection mysql_pconnect() | data is binary-safe but not encoded properly* | data is binary-safe but not encoded properly* |
using SET NAMES ‘utf8’; | data destroyed; string is converted to “????” | works fine! |
* It seems that MySQL will look at the data as being a series of bytes all within the latin1 codepage. If you use the same code to read and write the data then it will round-trip fine (but MySQL’s collation will obviously be wrong). If you use tools with a knowledge of the data types used by MySQL, like phpMyAdmin or even mysql_dump, then the wrong encoding is obvious.
(this test was performed with PHP versions 4.4.2 and 5.1.2 and MySQL 5.0.20; PHP source for test)
Source : http://www.phpwact.org/php/i18n/utf-8/mysql
Source : http://www.phpwact.org/php/i18n/utf-8/mysql