最常见的问题就是数据库插入查看或导入导出出现了乱码,这应该是所有用过mysql的人都遇到过了吧。所以解决乱码问题,就应该理解其原因:就是客户端连接设置的编码与表编码不一致!

数据库字符集有多种,linux 系统默认是 utf-8 编码,而 windows 默认为 gbk 编码。一般我们常用的系统都是中英文命名和存储数据d系统,所以服务器和mysql数据库设置编码为 utf8。mysql 还有字符集 utf8mb4,兼容 utf8 ,可保持 emoji 表情,但比 utf8 多占用1个字节。


查看linux系统编码

shell> cat /etc/sysconfig/i18n
LANG="en_US.UTF-8"
SYSFONT="latarcyrheb-sun16"

shell> echo $LANG
en_US.UTF-8

shell> locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=


查看数据库可设置的字符集及校对规则

mysql> SHOW CHARACTER SET;
mysql> SELECT * FROM INFORMATION_SCHEMA.CHARACTER_SETS ORDER BY CHARACTER_SET_NAME;
+--------------------+----------------------+---------------------------------+--------+
| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION | MAXLEN |
+--------------------+----------------------+---------------------------------+--------+
| armscii8 | armscii8_general_ci | ARMSCII-8 Armenian | 1 |
| ascii | ascii_general_ci | US ASCII | 1 |
| big5 | big5_chinese_ci | Big5 Traditional Chinese | 2 |
| binary | binary | Binary pseudo charset | 1 |
| cp1250 | cp1250_general_ci | Windows Central European | 1 |
| cp1251 | cp1251_general_ci | Windows Cyrillic | 1 |
| cp1256 | cp1256_general_ci | Windows Arabic | 1 |
| cp1257 | cp1257_general_ci | Windows Baltic | 1 |
| cp850 | cp850_general_ci | DOS West European | 1 |
| cp852 | cp852_general_ci | DOS Central European | 1 |
| cp866 | cp866_general_ci | DOS Russian | 1 |
| cp932 | cp932_japanese_ci | SJIS for Windows Japanese | 2 |
| dec8 | dec8_swedish_ci | DEC West European | 1 |
| eucjpms | eucjpms_japanese_ci | UJIS for Windows Japanese | 3 |
| euckr | euckr_korean_ci | EUC-KR Korean | 2 |
| gb18030 | gb18030_chinese_ci | China National Standard GB18030 | 4 |
| gb2312 | gb2312_chinese_ci | GB2312 Simplified Chinese | 2 |
| gbk | gbk_chinese_ci | GBK Simplified Chinese | 2 |
| geostd8 | geostd8_general_ci | GEOSTD8 Georgian | 1 |
| greek | greek_general_ci | ISO 8859-7 Greek | 1 |
| hebrew | hebrew_general_ci | ISO 8859-8 Hebrew | 1 |
| hp8 | hp8_english_ci | HP West European | 1 |
| keybcs2 | keybcs2_general_ci | DOS Kamenicky Czech-Slovak | 1 |
| koi8r | koi8r_general_ci | KOI8-R Relcom Russian | 1 |
| koi8u | koi8u_general_ci | KOI8-U Ukrainian | 1 |
| latin1 | latin1_swedish_ci | cp1252 West European | 1 |
| latin2 | latin2_general_ci | ISO 8859-2 Central European | 1 |
| latin5 | latin5_turkish_ci | ISO 8859-9 Turkish | 1 |
| latin7 | latin7_general_ci | ISO 8859-13 Baltic | 1 |
| macce | macce_general_ci | Mac Central European | 1 |
| macroman | macroman_general_ci | Mac West European | 1 |
| sjis | sjis_japanese_ci | Shift-JIS Japanese | 2 |
| swe7 | swe7_swedish_ci | 7bit Swedish | 1 |
| tis620 | tis620_thai_ci | TIS620 Thai | 1 |
| ucs2 | ucs2_general_ci | UCS-2 Unicode | 2 |
| ujis | ujis_japanese_ci | EUC-JP Japanese | 3 |
| utf16 | utf16_general_ci | UTF-16 Unicode | 4 |
| utf16le | utf16le_general_ci | UTF-16LE Unicode | 4 |
| utf32 | utf32_general_ci | UTF-32 Unicode | 4 |
| utf8 | utf8_general_ci | UTF-8 Unicode | 3 |
| utf8mb4 | utf8mb4_general_ci | UTF-8 Unicode | 4 |
+--------------------+----------------------+---------------------------------+--------+


查看 mysql 数据库系统默认字符集

mysql> SHOW GLOBAL VARIABLES LIKE '%character_set%';
+--------------------------+-------------------------------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.7.16-linux-glibc2.5-i686/share/charsets/ |
+--------------------------+-------------------------------------------------------------+


说明:

character_set_client :客户端使用的字符集
character_set_results :服务器查询结果到客户端字符集
character_set_connection :客户端到服务器默认转换的编码
character_set_database :数据库创建时默认的字符集(MySQL 5.7.6 后不推荐使用)
character_set_server :数据库创建时默认的字符集
character_set_system :系统默认元数据字符集,总是为 “utf8”
character_set_filesystem :导入导出时文件名字符集,默认binary则保持与client设置一样。


更改系统字符集,改为 “gb2312” 

shell> vi /etc/sysconfig/i18n
LANG="zh_CN.GB2312"
#LANG="en_US.UTF-8"
SYSFONT="latarcyrheb-sun16"

设置生效
shell> . /etc/sysconfig/i18n


改了系统编码后,再访问查看mysql字符集.字符集没有什么变化.

mysql> SHOW GLOBAL VARIABLES LIKE '%character_set%';
+--------------------------+-------------------------------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.7.16-linux-glibc2.5-i686/share/charsets/ |
+--------------------------+-------------------------------------------------------------+


在linux服务器端的终端访问mysql,再查看当前会话的字符集,发现客户端这几个字符集编码变为 “gb2312”!

mysql> SHOW VARIABLES LIKE '%character_set%';
+--------------------------+-------------------------------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------------------------------+
| character_set_client | gb2312 |
| character_set_connection | gb2312 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | gb2312 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.7.16-linux-glibc2.5-i686/share/charsets/ |
+--------------------------+-------------------------------------------------------------+


数据库默认字符集为 "character_set_database = latin1",所以当我们创建数据库没有指定字符集时,则使用默认设置。

mysql> CREATE DATABASE test;
mysql> SHOW CREATE DATABASE test;
+----------+-----------------------------------------------------------------+
| Database | Create Database |
+----------+-----------------------------------------------------------------+
| test | CREATE DATABASE `test` /*!40100 DEFAULT CHARACTER SET latin1 */ |
+----------+-----------------------------------------------------------------+


此外,我们创建表的默认字符集为当前数据库的字符集。如下,新创建的表字符集也为 “latin1”

mysql> USE test;
mysql> CREATE TABLE tab(id INT NOT NULL AUTO_INCREMENT,name VARCHAR(30) NOT NULL,PRIMARY KEY(iD));
mysql> SHOW CREATE TABLE tab\G;
*************************** 1. row ***************************
Table: tab
Create Table: CREATE TABLE `tab` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(30) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1


现在创建数据库,并且指定默认字符集为“utf8”,再创建一张未指定字符集的表和一张指定其他字符集的表。发现,未指定字符集的表默认字符集与数据库一致的。

mysql> CREATE DATABASE test2 CHARACTER SET utf8;
mysql> USE test2;
mysql> CREATE TABLE tab(id INT NOT NULL AUTO_INCREMENT,name VARCHAR(30) NOT NULL,PRIMARY KEY(iD));
mysql> CREATE TABLE tab1(id INT NOT NULL AUTO_INCREMENT,name VARCHAR(30) NOT NULL,PRIMARY KEY(iD)) DEFAULT CHARSET=utf8mb4;

mysql> SHOW CREATE TABLE tab\G;
*************************** 1. row ***************************
Table: tab
Create Table: CREATE TABLE `tab` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(30) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

mysql> SHOW CREATE TABLE tab1\G;
*************************** 1. row ***************************
Table: tab1
Create Table: CREATE TABLE `tab1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(30) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4


查看所有数据库字符集及排序规则

mysql> SELECT SCHEMA_NAME,DEFAULT_CHARACTER_SET_NAME,DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA;
+--------------------+----------------------------+------------------------+
| SCHEMA_NAME | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
+--------------------+----------------------------+------------------------+
| information_schema | utf8 | utf8_general_ci |
| mysql | latin1 | latin1_swedish_ci |
| performance_schema | utf8 | utf8_general_ci |
| sys | utf8 | utf8_general_ci |
| test | latin1 | latin1_swedish_ci |
| test2 | utf8 | utf8_general_ci |
+--------------------+----------------------------+------------------------+


查看所有表的排序规则(字符集)

mysql> SELECT TABLE_SCHEMA,TABLE_NAME,TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA IN('test','test2');
+--------------+------------+--------------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_COLLATION |
+--------------+------------+--------------------+
| test | tab | latin1_swedish_ci |
| test2 | tab | utf8_general_ci |
| test2 | tab1 | utf8mb4_general_ci |
+--------------+------------+--------------------+


[navicat 测试]

现在开始插入表数据,我们插入数据到数据库表 test.tab,该数据库及表字符集都是 “latin1”。


当前我用 navicat 连接,该客户端默认设置的相关连接编码为:


character_set_client=utf8


character_set_connection=utf8


character_set_results=utf8


--插入英文字符,正常
mysql> use test;
mysql> insert into tab values(1,'123'),(2,'abc');

--插入中外字符,失败
mysql> insert into tab values(3,'文档');
1366 - Incorrect string value: '\xE6\x96\x87\xE6\xA1\xA3' for column 'name' at row 1

--怎么插入中文呢??现在将客户端会话编码设置为latin1,保持与数据库一致,再执行插入中文。
mysql> SET names latin1;
mysql> SHOW VARIABLES LIKE '%character_set%';
mysql> insert into tab values(3,'文档');
mysql> select * from tab;
+----+------+
| id | name |
+----+------+
| 1 | 123 |
| 2 | abc |
| 3 | 文档 |
+----+------+

这会儿就没有报错了!查询也正常!现在将连接会话改为 utf8 ,再次查询。
mysql> SET names utf8;
mysql> SHOW VARIABLES LIKE '%character_set%';
mysql> select * from tab;
+----+--------+
| id | name |
+----+--------+
| 1 | 123 |
| 2 | abc |
| 3 | ?–???£ |
+----+--------+


因为表字符编码(latin1)与当前客户端会话的字符编码(utf8)不一样,所以查询出来了乱码!!


为了防止乱码出现,插入数据时客户端要与表字符集一样,查询数据时,客户端也要与表编码一样。


以下更详细一步进行测试,再次确认,系统当前默认配置的字符编码如下:


+--------------------------+-------------------------------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
+--------------------------+-------------------------------------------------------------+



--  现将客户端字符集 utf8 改为 latin1 ,插入时中文不乱码(下一步有测试),查询时乱码!
mysql> set character_set_client = latin1;
mysql> insert into tab values(4,'测试');
mysql> select * from tab;
+----+--------+
| id | name |
+----+--------+
| 1 | 123 |
| 2 | abc |
| 3 | ?–???£ |
| 4 | ?μ?èˉ? |
+----+--------+

-- 单独将服务器查询结果字符集 utf8 改为 latin1,则查询结果正常;不改则乱码。
mysql> set character_set_results = latin1;
mysql> select * from tab;
+----+--------+
| id | name |
+----+--------+
| 1 | 123 |
| 2 | abc |
| 3 | 文档 |
| 4 | 测试 |
+----+--------+


客户端插入和查询都正常了,那么 character_set_connection 是干嘛用的呢?干嘛用的呢?



character_set_connection 只作为中间转换,如果字符串未设置字符集且设置了排序规则,默认字符集为character_set_connection设置的字符集,且会判断字符串显式设置的排序规则是否准确,不正确则出错。


+--------------------------------------------------------------------------------------+
| |
|character_set_client <------> character_set_connection <------> character_set_results |
| |
+--------------------------------------------------------------------------------------+


--设置 connection 字符集为 utf8,
mysql> set character_set_connection = utf8;

--只设置排序规则,但非当前 connection 对应的排序规则。错误!
mysql> select 'Müller' COLLATE latin1_german1_ci;
1253 - COLLATION 'latin1_german1_ci' is not valid for CHARACTER SET 'utf8'

--只设置排序规则,为当前connection对应的排序规则。正确!
mysql> select 'Müller' COLLATE utf8_general_ci;

--如果字符前面强制字符集 _latin1或_utf8 ,则不会进行connection转换。


【导出导入数据乱码问题】

数据库表字符集为 latin1 ,客户端字符集默认与linux操作系统设置一致为 utf8.导出数据时乱码!!

shell> mysqldump -uroot -p --opt -t test > test.sql
shell> egrep -v "#|\*|--|^$" test.sql
LOCK TABLES `tab` WRITE;
INSERT INTO `tab` VALUES (1,'123'),(2,'abc'),(3,'?–???£'),(4,'?μ?èˉ?');
UNLOCK TABLES;

现在设置导出时字符集为 latin1 ,与表一样。结果正常!!

shell> mysqldump -uroot -p --default-character-set=latin1 --opt -t test > test.sql
shell> egrep -v "#|\*|--|^$" test.sql
LOCK TABLES `tab` WRITE;
INSERT INTO `tab` VALUES (1,'123'),(2,'abc'),(3,'文档'),(4,'测试');
UNLOCK TABLES;

导入同理,若导入导出出现了乱码,那只要设置相同的字符集即可。



【更改字符集】

客户端的都设置,为会话级别:
mysql> SET names latin1;

客户端单个设置,为会话级别:
mysql> set character_set_connection = utf8;
mysql> set @@session.character_set_connection = utf8;

客户端单个设置,为全局设置:
mysql> set global character_set_client = gb2312;
mysql> set @@global.character_set_client = latin1;


永久设置,在配置文件中更改:
shell> vi /etc/my.cnf

[client]
default_character_set = utf8

[mysql] #客户端设置
default_character_set = utf8

[mysqld] #服务器设置
init-connect='SET NAMES utf8'
character-set-server = utf8
collation-server = utf8_general_ci
#default-character-set = utf8
skip-character-set-client-handshake#忽略客户端字符集,使用服务器设置



其他参考:​​MySQL字符集编码解析​