在计算机中只能存储二进制数据,那该怎么存储字符串呢?当然是建立字符与二进制数据的映射关系
了,建立这个关系最起码要搞清楚两件事:
- 界定清楚字符范围:需要把哪些字符映射成二进制数据?
- 编码与解码:也就是字符串与二进制数据如何映射。将一个字符映射成一个二进制数据的过程也叫做 编码 ,将一个二进制数据映射到一个字符的过程叫做 解码 。
常见字符集
ASCII 字符集
ASCII 字符集共收录128个字符,包括空格、标点符号、数字、大小写字母和一些不可见字符。由于总共才128个字符,所以可以使用1个字节来进行编码。
ISO 8859-1 字符集
ISO 8859-1 字符集共收录256个字符,是在 ASCII 字符集的基础上又扩充了128个西欧常用字符(包括德法两国的字母),也可以使用1个字节来进行编码。这个字符集也有一个别名 latin1 。
GB2312 字符集
GB2312 字符集收录了汉字以及拉丁字母、希腊字母、日文平假名及片假名字母、俄语西里尔字母。其中收录汉字6763个,其他文字符号682个。
同时这种字符集又兼容 ASCII 字符集,所以在编码方式上显得有些奇怪,如果该字符(ASCII 字符)在 ASCII 字符集中,则采用1字节编码,否则采用2字节编码。
这种表示一个字符需要的字节数可能不同的编码方式称为 变长编码方式 。
GBK 字符集
GBK 字符集只是在收录字符范围上对 GB2312 字符集作了扩充,编码方式上兼容 GB2312 。
utf8 字符集
收录地球上能想到的所有字符,而且还在不断扩充。utf8字符集兼容 ASCII 字符集,采用变长编码方式,编码一个字符需要使用1~4个字节。
utf8只是Unicode字符集的一种编码方案,Unicode字符集可以采用utf8、utf16、utf32这几种编码方案。
- utf8使用1~4个字节编码一个字符;
- utf16使用2个或4个字节编码一个字符;
- utf32使用4个字节编码一个字符。
MySQL 支持字符集与排序规则
查看MySQL支持的字符集
MySQL 支持好多好多种字符集,查看当前 MySQL 中支持的字符集可以用下边这个语句:
SHOW (CHARACTER SET|CHARSET) [LIKE 匹配的模式];
CHARACTER SET 和 CHARSET 是同义词,用任意一个都可以。
show character set ;
执行上面语句,输出如下:
Charset | Description | Default collation | Maxlen |
armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |
ascii | US ASCII | ascii_general_ci | 1 |
big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
binary | Binary pseudo charset | binary | 1 |
cp1250 | Windows Central European | cp1250_general_ci | 1 |
cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |
cp1256 | Windows Arabic | cp1256_general_ci | 1 |
cp1257 | Windows Baltic | cp1257_general_ci | 1 |
cp850 | DOS West European | cp850_general_ci | 1 |
cp852 | DOS Central European | cp852_general_ci | 1 |
cp866 | DOS Russian | cp866_general_ci | 1 |
cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |
dec8 | DEC West European | dec8_swedish_ci | 1 |
eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |
euckr | EUC-KR Korean | euckr_korean_ci | 2 |
gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 |
gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 |
greek | ISO 8859-7 Greek | greek_general_ci | 1 |
hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |
hp8 | HP West European | hp8_english_ci | 1 |
keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 |
koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |
koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 |
latin1 | cp1252 West European | latin1_swedish_ci | 1 |
latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |
latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |
macce | Mac Central European | macce_general_ci | 1 |
macroman | Mac West European | macroman_general_ci | 1 |
sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 |
swe7 | 7bit Swedish | swe7_swedish_ci | 1 |
tis620 | TIS620 Thai | tis620_thai_ci | 1 |
ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |
ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |
utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |
utf32 | UTF-32 Unicode | utf32_general_ci | 4 |
utf8mb3 | UTF-8 Unicode | utf8mb3_general_ci | 3 |
utf8mb4 | UTF-8 Unicode | utf8mb4_0900_ai_ci | 4 |
列说明:
- Charset:字符集名称;
- Description:字符集描述;
- Default collation:字符集中一种默认的 比较规则;
- Maxlen:代表该种字符集表示一个字符最多需要几个字节。
utf字符集
在 MySQL 中 utf8 是 utf8mb3 的别名,所以之后在 MySQL 中提到 utf8 就意味着使用1~3个字节来表示一个字符,如果大家有使用4字节编码一个字符的情况,比如存储一些emoji表情啥的,那请使用 utf8mb4 。
utf8mb3与utf8mb4的区别:
- utf8mb3 :阉割过的 utf8 字符集,只使用1~3个字节表示字符。
- utf8mb4 :正宗的 utf8 字符集,使用1~4个字节表示字符。
为什么需要utf8mb3和utf8mb4呢?
在 MySQL 中字符集表示一个字符所用最大字节长度在某些方面会影响系统的存储和性能,常用的一些字符使用1~3个字节就可以表示了,所以可以使用utf8mb3编码存储和性能更佳。
查看MySQL比较规则
查看 MySQL 中支持的比较规则的命令如下:
SHOW COLLATION [LIKE 匹配的模式];
show collation ;
执行上面语句,有286个比较规则,挑选常见的输出如下:
Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |
ascii_bin | ascii | 65 | Yes | 1 | PAD SPACE | |
ascii_general_ci | ascii | 11 | Yes | Yes | 1 | PAD SPACE |
gb18030_bin | gb18030 | 249 | Yes | 1 | PAD SPACE | |
gb18030_chinese_ci | gb18030 | 248 | Yes | Yes | 2 | PAD SPACE |
gb18030_unicode_520_ci | gb18030 | 250 | Yes | 8 | PAD SPACE | |
gb2312_bin | gb2312 | 86 | Yes | 1 | PAD SPACE | |
gb2312_chinese_ci | gb2312 | 24 | Yes | Yes | 1 | PAD SPACE |
gbk_bin | gbk | 87 | Yes | 1 | PAD SPACE | |
gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | PAD SPACE |
latin1_bin | latin1 | 47 | Yes | 1 | PAD SPACE | |
latin1_danish_ci | latin1 | 15 | Yes | 1 | PAD SPACE | |
latin1_general_ci | latin1 | 48 | Yes | 1 | PAD SPACE | |
latin1_general_cs | latin1 | 49 | Yes | 1 | PAD SPACE | |
latin1_german1_ci | latin1 | 5 | Yes | 1 | PAD SPACE | |
latin1_german2_ci | latin1 | 31 | Yes | 2 | PAD SPACE | |
latin1_spanish_ci | latin1 | 94 | Yes | 1 | PAD SPACE | |
latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 | PAD SPACE |
latin2_bin | latin2 | 77 | Yes | 1 | PAD SPACE | |
latin2_croatian_ci | latin2 | 27 | Yes | 1 | PAD SPACE | |
latin2_czech_cs | latin2 | 2 | Yes | 4 | PAD SPACE | |
latin2_general_ci | latin2 | 9 | Yes | Yes | 1 | PAD SPACE |
latin2_hungarian_ci | latin2 | 21 | Yes | 1 | PAD SPACE | |
latin5_bin | latin5 | 78 | Yes | 1 | PAD SPACE | |
latin5_turkish_ci | latin5 | 30 | Yes | Yes | 1 | PAD SPACE |
latin7_bin | latin7 | 79 | Yes | 1 | PAD SPACE | |
latin7_estonian_cs | latin7 | 20 | Yes | 1 | PAD SPACE | |
latin7_general_ci | latin7 | 41 | Yes | Yes | 1 | PAD SPACE |
utf16le_bin | utf16le | 62 | Yes | 1 | PAD SPACE | |
utf16le_general_ci | utf16le | 56 | Yes | Yes | 1 | PAD SPACE |
utf16_bin | utf16 | 55 | Yes | 1 | PAD SPACE | |
utf16_croatian_ci | utf16 | 122 | Yes | 8 | PAD SPACE | |
utf16_czech_ci | utf16 | 111 | Yes | 8 | PAD SPACE | |
utf16_danish_ci | utf16 | 112 | Yes | 8 | PAD SPACE | |
utf16_esperanto_ci | utf16 | 118 | Yes | 8 | PAD SPACE | |
utf16_estonian_ci | utf16 | 107 | Yes | 8 | PAD SPACE | |
utf16_general_ci | utf16 | 54 | Yes | Yes | 1 | PAD SPACE |
utf16_german2_ci | utf16 | 121 | Yes | 8 | PAD SPACE | |
utf16_hungarian_ci | utf16 | 119 | Yes | 8 | PAD SPACE | |
utf16_icelandic_ci | utf16 | 102 | Yes | 8 | PAD SPACE | |
utf16_latvian_ci | utf16 | 103 | Yes | 8 | PAD SPACE | |
utf16_lithuanian_ci | utf16 | 113 | Yes | 8 | PAD SPACE | |
utf16_persian_ci | utf16 | 117 | Yes | 8 | PAD SPACE | |
utf16_polish_ci | utf16 | 106 | Yes | 8 | PAD SPACE | |
utf16_romanian_ci | utf16 | 104 | Yes | 8 | PAD SPACE | |
utf16_roman_ci | utf16 | 116 | Yes | 8 | PAD SPACE | |
utf16_sinhala_ci | utf16 | 120 | Yes | 8 | PAD SPACE | |
utf16_slovak_ci | utf16 | 114 | Yes | 8 | PAD SPACE | |
utf16_slovenian_ci | utf16 | 105 | Yes | 8 | PAD SPACE | |
utf16_spanish2_ci | utf16 | 115 | Yes | 8 | PAD SPACE | |
utf16_spanish_ci | utf16 | 108 | Yes | 8 | PAD SPACE | |
utf16_swedish_ci | utf16 | 109 | Yes | 8 | PAD SPACE | |
utf16_turkish_ci | utf16 | 110 | Yes | 8 | PAD SPACE | |
utf16_unicode_520_ci | utf16 | 123 | Yes | 8 | PAD SPACE | |
utf16_unicode_ci | utf16 | 101 | Yes | 8 | PAD SPACE | |
utf16_vietnamese_ci | utf16 | 124 | Yes | 8 | PAD SPACE | |
utf32_bin | utf32 | 61 | Yes | 1 | PAD SPACE | |
utf32_croatian_ci | utf32 | 181 | Yes | 8 | PAD SPACE | |
utf32_czech_ci | utf32 | 170 | Yes | 8 | PAD SPACE | |
utf32_danish_ci | utf32 | 171 | Yes | 8 | PAD SPACE | |
utf32_esperanto_ci | utf32 | 177 | Yes | 8 | PAD SPACE | |
utf32_estonian_ci | utf32 | 166 | Yes | 8 | PAD SPACE | |
utf32_general_ci | utf32 | 60 | Yes | Yes | 1 | PAD SPACE |
utf32_german2_ci | utf32 | 180 | Yes | 8 | PAD SPACE | |
utf32_hungarian_ci | utf32 | 178 | Yes | 8 | PAD SPACE | |
utf32_icelandic_ci | utf32 | 161 | Yes | 8 | PAD SPACE | |
utf32_latvian_ci | utf32 | 162 | Yes | 8 | PAD SPACE | |
utf32_lithuanian_ci | utf32 | 172 | Yes | 8 | PAD SPACE | |
utf32_persian_ci | utf32 | 176 | Yes | 8 | PAD SPACE | |
utf32_polish_ci | utf32 | 165 | Yes | 8 | PAD SPACE | |
utf32_romanian_ci | utf32 | 163 | Yes | 8 | PAD SPACE | |
utf32_roman_ci | utf32 | 175 | Yes | 8 | PAD SPACE | |
utf32_sinhala_ci | utf32 | 179 | Yes | 8 | PAD SPACE | |
utf32_slovak_ci | utf32 | 173 | Yes | 8 | PAD SPACE | |
utf32_slovenian_ci | utf32 | 164 | Yes | 8 | PAD SPACE | |
utf32_spanish2_ci | utf32 | 174 | Yes | 8 | PAD SPACE | |
utf32_spanish_ci | utf32 | 167 | Yes | 8 | PAD SPACE | |
utf32_swedish_ci | utf32 | 168 | Yes | 8 | PAD SPACE | |
utf32_turkish_ci | utf32 | 169 | Yes | 8 | PAD SPACE | |
utf32_unicode_520_ci | utf32 | 182 | Yes | 8 | PAD SPACE | |
utf32_unicode_ci | utf32 | 160 | Yes | 8 | PAD SPACE | |
utf32_vietnamese_ci | utf32 | 183 | Yes | 8 | PAD SPACE | |
utf8mb3_bin | utf8mb3 | 83 | Yes | 1 | PAD SPACE | |
utf8mb3_croatian_ci | utf8mb3 | 213 | Yes | 8 | PAD SPACE | |
utf8mb3_czech_ci | utf8mb3 | 202 | Yes | 8 | PAD SPACE | |
utf8mb3_danish_ci | utf8mb3 | 203 | Yes | 8 | PAD SPACE | |
utf8mb3_esperanto_ci | utf8mb3 | 209 | Yes | 8 | PAD SPACE | |
utf8mb3_estonian_ci | utf8mb3 | 198 | Yes | 8 | PAD SPACE | |
utf8mb3_general_ci | utf8mb3 | 33 | Yes | Yes | 1 | PAD SPACE |
utf8mb3_general_mysql500_ci | utf8mb3 | 223 | Yes | 1 | PAD SPACE | |
utf8mb3_german2_ci | utf8mb3 | 212 | Yes | 8 | PAD SPACE | |
utf8mb3_hungarian_ci | utf8mb3 | 210 | Yes | 8 | PAD SPACE | |
utf8mb3_icelandic_ci | utf8mb3 | 193 | Yes | 8 | PAD SPACE | |
utf8mb3_latvian_ci | utf8mb3 | 194 | Yes | 8 | PAD SPACE | |
utf8mb3_lithuanian_ci | utf8mb3 | 204 | Yes | 8 | PAD SPACE | |
utf8mb3_persian_ci | utf8mb3 | 208 | Yes | 8 | PAD SPACE | |
utf8mb3_polish_ci | utf8mb3 | 197 | Yes | 8 | PAD SPACE | |
utf8mb3_romanian_ci | utf8mb3 | 195 | Yes | 8 | PAD SPACE | |
utf8mb3_roman_ci | utf8mb3 | 207 | Yes | 8 | PAD SPACE | |
utf8mb3_sinhala_ci | utf8mb3 | 211 | Yes | 8 | PAD SPACE | |
utf8mb3_slovak_ci | utf8mb3 | 205 | Yes | 8 | PAD SPACE | |
utf8mb3_slovenian_ci | utf8mb3 | 196 | Yes | 8 | PAD SPACE | |
utf8mb3_spanish2_ci | utf8mb3 | 206 | Yes | 8 | PAD SPACE | |
utf8mb3_spanish_ci | utf8mb3 | 199 | Yes | 8 | PAD SPACE | |
utf8mb3_swedish_ci | utf8mb3 | 200 | Yes | 8 | PAD SPACE | |
utf8mb3_tolower_ci | utf8mb3 | 76 | Yes | 1 | PAD SPACE | |
utf8mb3_turkish_ci | utf8mb3 | 201 | Yes | 8 | PAD SPACE | |
utf8mb3_unicode_520_ci | utf8mb3 | 214 | Yes | 8 | PAD SPACE | |
utf8mb3_unicode_ci | utf8mb3 | 192 | Yes | 8 | PAD SPACE | |
utf8mb3_vietnamese_ci | utf8mb3 | 215 | Yes | 8 | PAD SPACE | |
utf8mb4_0900_ai_ci | utf8mb4 | 255 | Yes | Yes | 0 | NO PAD |
utf8mb4_0900_as_ci | utf8mb4 | 305 | Yes | 0 | NO PAD | |
utf8mb4_0900_as_cs | utf8mb4 | 278 | Yes | 0 | NO PAD | |
utf8mb4_0900_bin | utf8mb4 | 309 | Yes | 1 | NO PAD | |
utf8mb4_bg_0900_ai_ci | utf8mb4 | 318 | Yes | 0 | NO PAD | |
utf8mb4_bg_0900_as_cs | utf8mb4 | 319 | Yes | 0 | NO PAD | |
utf8mb4_bin | utf8mb4 | 46 | Yes | 1 | PAD SPACE | |
utf8mb4_bs_0900_ai_ci | utf8mb4 | 316 | Yes | 0 | NO PAD | |
utf8mb4_bs_0900_as_cs | utf8mb4 | 317 | Yes | 0 | NO PAD | |
utf8mb4_croatian_ci | utf8mb4 | 245 | Yes | 8 | PAD SPACE | |
utf8mb4_cs_0900_ai_ci | utf8mb4 | 266 | Yes | 0 | NO PAD | |
utf8mb4_cs_0900_as_cs | utf8mb4 | 289 | Yes | 0 | NO PAD | |
utf8mb4_czech_ci | utf8mb4 | 234 | Yes | 8 | PAD SPACE | |
utf8mb4_danish_ci | utf8mb4 | 235 | Yes | 8 | PAD SPACE | |
utf8mb4_da_0900_ai_ci | utf8mb4 | 267 | Yes | 0 | NO PAD | |
utf8mb4_da_0900_as_cs | utf8mb4 | 290 | Yes | 0 | NO PAD | |
utf8mb4_de_pb_0900_ai_ci | utf8mb4 | 256 | Yes | 0 | NO PAD | |
utf8mb4_de_pb_0900_as_cs | utf8mb4 | 279 | Yes | 0 | NO PAD | |
utf8mb4_eo_0900_ai_ci | utf8mb4 | 273 | Yes | 0 | NO PAD | |
utf8mb4_eo_0900_as_cs | utf8mb4 | 296 | Yes | 0 | NO PAD | |
utf8mb4_esperanto_ci | utf8mb4 | 241 | Yes | 8 | PAD SPACE | |
utf8mb4_estonian_ci | utf8mb4 | 230 | Yes | 8 | PAD SPACE | |
utf8mb4_es_0900_ai_ci | utf8mb4 | 263 | Yes | 0 | NO PAD | |
utf8mb4_es_0900_as_cs | utf8mb4 | 286 | Yes | 0 | NO PAD | |
utf8mb4_es_trad_0900_ai_ci | utf8mb4 | 270 | Yes | 0 | NO PAD | |
utf8mb4_es_trad_0900_as_cs | utf8mb4 | 293 | Yes | 0 | NO PAD | |
utf8mb4_et_0900_ai_ci | utf8mb4 | 262 | Yes | 0 | NO PAD | |
utf8mb4_et_0900_as_cs | utf8mb4 | 285 | Yes | 0 | NO PAD | |
utf8mb4_general_ci | utf8mb4 | 45 | Yes | 1 | PAD SPACE | |
utf8mb4_german2_ci | utf8mb4 | 244 | Yes | 8 | PAD SPACE | |
utf8mb4_gl_0900_ai_ci | utf8mb4 | 320 | Yes | 0 | NO PAD | |
utf8mb4_gl_0900_as_cs | utf8mb4 | 321 | Yes | 0 | NO PAD | |
utf8mb4_hr_0900_ai_ci | utf8mb4 | 275 | Yes | 0 | NO PAD | |
utf8mb4_hr_0900_as_cs | utf8mb4 | 298 | Yes | 0 | NO PAD | |
utf8mb4_hungarian_ci | utf8mb4 | 242 | Yes | 8 | PAD SPACE | |
utf8mb4_hu_0900_ai_ci | utf8mb4 | 274 | Yes | 0 | NO PAD | |
utf8mb4_hu_0900_as_cs | utf8mb4 | 297 | Yes | 0 | NO PAD | |
utf8mb4_icelandic_ci | utf8mb4 | 225 | Yes | 8 | PAD SPACE | |
utf8mb4_is_0900_ai_ci | utf8mb4 | 257 | Yes | 0 | NO PAD | |
utf8mb4_is_0900_as_cs | utf8mb4 | 280 | Yes | 0 | NO PAD | |
utf8mb4_ja_0900_as_cs | utf8mb4 | 303 | Yes | 0 | NO PAD | |
utf8mb4_ja_0900_as_cs_ks | utf8mb4 | 304 | Yes | 24 | NO PAD | |
utf8mb4_latvian_ci | utf8mb4 | 226 | Yes | 8 | PAD SPACE | |
utf8mb4_la_0900_ai_ci | utf8mb4 | 271 | Yes | 0 | NO PAD | |
utf8mb4_la_0900_as_cs | utf8mb4 | 294 | Yes | 0 | NO PAD | |
utf8mb4_lithuanian_ci | utf8mb4 | 236 | Yes | 8 | PAD SPACE | |
utf8mb4_lt_0900_ai_ci | utf8mb4 | 268 | Yes | 0 | NO PAD | |
utf8mb4_lt_0900_as_cs | utf8mb4 | 291 | Yes | 0 | NO PAD | |
utf8mb4_lv_0900_ai_ci | utf8mb4 | 258 | Yes | 0 | NO PAD | |
utf8mb4_lv_0900_as_cs | utf8mb4 | 281 | Yes | 0 | NO PAD | |
utf8mb4_mn_cyrl_0900_ai_ci | utf8mb4 | 322 | Yes | 0 | NO PAD | |
utf8mb4_mn_cyrl_0900_as_cs | utf8mb4 | 323 | Yes | 0 | NO PAD | |
utf8mb4_nb_0900_ai_ci | utf8mb4 | 310 | Yes | 0 | NO PAD | |
utf8mb4_nb_0900_as_cs | utf8mb4 | 311 | Yes | 0 | NO PAD | |
utf8mb4_nn_0900_ai_ci | utf8mb4 | 312 | Yes | 0 | NO PAD | |
utf8mb4_nn_0900_as_cs | utf8mb4 | 313 | Yes | 0 | NO PAD | |
utf8mb4_persian_ci | utf8mb4 | 240 | Yes | 8 | PAD SPACE | |
utf8mb4_pl_0900_ai_ci | utf8mb4 | 261 | Yes | 0 | NO PAD | |
utf8mb4_pl_0900_as_cs | utf8mb4 | 284 | Yes | 0 | NO PAD | |
utf8mb4_polish_ci | utf8mb4 | 229 | Yes | 8 | PAD SPACE | |
utf8mb4_romanian_ci | utf8mb4 | 227 | Yes | 8 | PAD SPACE | |
utf8mb4_roman_ci | utf8mb4 | 239 | Yes | 8 | PAD SPACE | |
utf8mb4_ro_0900_ai_ci | utf8mb4 | 259 | Yes | 0 | NO PAD | |
utf8mb4_ro_0900_as_cs | utf8mb4 | 282 | Yes | 0 | NO PAD | |
utf8mb4_ru_0900_ai_ci | utf8mb4 | 306 | Yes | 0 | NO PAD | |
utf8mb4_ru_0900_as_cs | utf8mb4 | 307 | Yes | 0 | NO PAD | |
utf8mb4_sinhala_ci | utf8mb4 | 243 | Yes | 8 | PAD SPACE | |
utf8mb4_sk_0900_ai_ci | utf8mb4 | 269 | Yes | 0 | NO PAD | |
utf8mb4_sk_0900_as_cs | utf8mb4 | 292 | Yes | 0 | NO PAD | |
utf8mb4_slovak_ci | utf8mb4 | 237 | Yes | 8 | PAD SPACE | |
utf8mb4_slovenian_ci | utf8mb4 | 228 | Yes | 8 | PAD SPACE | |
utf8mb4_sl_0900_ai_ci | utf8mb4 | 260 | Yes | 0 | NO PAD | |
utf8mb4_sl_0900_as_cs | utf8mb4 | 283 | Yes | 0 | NO PAD | |
utf8mb4_spanish2_ci | utf8mb4 | 238 | Yes | 8 | PAD SPACE | |
utf8mb4_spanish_ci | utf8mb4 | 231 | Yes | 8 | PAD SPACE | |
utf8mb4_sr_latn_0900_ai_ci | utf8mb4 | 314 | Yes | 0 | NO PAD | |
utf8mb4_sr_latn_0900_as_cs | utf8mb4 | 315 | Yes | 0 | NO PAD | |
utf8mb4_sv_0900_ai_ci | utf8mb4 | 264 | Yes | 0 | NO PAD | |
utf8mb4_sv_0900_as_cs | utf8mb4 | 287 | Yes | 0 | NO PAD | |
utf8mb4_swedish_ci | utf8mb4 | 232 | Yes | 8 | PAD SPACE | |
utf8mb4_tr_0900_ai_ci | utf8mb4 | 265 | Yes | 0 | NO PAD | |
utf8mb4_tr_0900_as_cs | utf8mb4 | 288 | Yes | 0 | NO PAD | |
utf8mb4_turkish_ci | utf8mb4 | 233 | Yes | 8 | PAD SPACE | |
utf8mb4_unicode_520_ci | utf8mb4 | 246 | Yes | 8 | PAD SPACE | |
utf8mb4_unicode_ci | utf8mb4 | 224 | Yes | 8 | PAD SPACE | |
utf8mb4_vietnamese_ci | utf8mb4 | 247 | Yes | 8 | PAD SPACE | |
utf8mb4_vi_0900_ai_ci | utf8mb4 | 277 | Yes | 0 | NO PAD | |
utf8mb4_vi_0900_as_cs | utf8mb4 | 300 | Yes | 0 | NO PAD | |
utf8mb4_zh_0900_as_cs | utf8mb4 | 308 | Yes | 0 | NO PAD |
列说明:
- Collation:比较规则名称,命名方式 字符集名称_语言_后缀。
后缀 | 英文释义 | 描述 |
_ai | accent insensitive | 不区分重音 |
_as | accent sensitive | 区分重音 |
_ci | case insensitive | 不区分大小写 |
_cs | case sensitive | 区分大小写 |
_bin | binary | 以二进制方式比较 |
- Charset: 比较规则所属字符集。
- Id:比较规则id。
- Default:YES 的就是该字符集的默认比较规则。
- Compiled:是否将字符集编译到服务器中。
- Sortlen:这与对字符集中表示的字符串进行排序所需的内存量有关。
字符集的应用
MySQL 有4个级别的字符集和比较规则,分别是:
- 服务器级别
- 数据库级别
- 表级别
- 列级别
由于字符集和比较规则是互相有联系的,如果我们只修改了字符集,比较规则也会跟着变化,如果只修改了比较规则,字符集也会跟着变化,具体规则如下:
- 只修改字符集,则比较规则将变为修改后的字符集默认的比较规则。
- 只修改比较规则,则字符集将变为修改后的比较规则对应的字符集。
服务器级别
MySQL 提供了两个系统变量来表示服务器级别的字符集和比较规则。
系统变量 | 描述 | 查看系统变量 | 系统变量值 |
character_set_server | 服务器级别的字符集 | SHOW VARIABLES LIKE ‘character_set_server’; | utf8mb3 |
collation_server | 服务器级别的比较规则 | SHOW VARIABLES LIKE ‘collation_server’; | utf8mb3_general_ci |
可以看到在我的计算机中服务器级别默认的字符集是 utf8 ,默认的比较规则是 utf8_general_ci 。
默认字符集和比较规则设置,修改my.ini
:
character_set_server=utf8mb4
collation_server=utf8mb4_general_ci
数据库级别
在创建和修改数据库的时候可以指定该数据库的字符集和比较规则,具体语法如下:
CREATE DATABASE 数据库名
[[DEFAULT] CHARACTER SET 字符集名称]
[[DEFAULT] COLLATE 比较规则名称];
ALTER DATABASE 数据库名
[[DEFAULT] CHARACTER SET 字符集名称]
[[DEFAULT] COLLATE 比较规则名称];
查看数据库字符集和比较规则:
select SCHEMA_NAME,DEFAULT_CHARACTER_SET_NAME,DEFAULT_COLLATION_NAME,SQL_PATH
from information_schema.SCHEMATA;
表级别
在创建和修改表的时候指定表的字符集和比较规则,语法如下:
CREATE TABLE 表名 (列的信息)
[[DEFAULT] CHARACTER SET 字符集名称]
[COLLATE 比较规则名称]]
ALTER TABLE 表名
[[DEFAULT] CHARACTER SET 字符集名称]
[COLLATE 比较规则名称]
查询MySQL表字符集和比较规则:
show table status from spring_security like 'user';
#或者
select * from information_schema.TABLES
where TABLE_SCHEMA='spring_security' and TABLE_NAME='user';
spring_security是数据库名称,user为表名称。输出结果格式化如下:
show create table user;
字符集输出:
CREATE TABLE `user` (
`id` bigint NOT NULL AUTO_INCREMENT,
`created_time` datetime DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`password` varchar(255) DEFAULT NULL,
`updated_time` datetime DEFAULT NULL,
`user_id` bigint DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb3
列级别
对于存储字符串的列,同一个表中的不同的列也可以有不同的字符集和比较规则。我们在创建和
修改列定义的时候可以指定该列的字符集和比较规则,语法如下:
CREATE TABLE 表名(
列名 字符串类型 [CHARACTER SET 字符集名称] [COLLATE 比较规则名称],
其他列...
);
ALTER TABLE 表名 MODIFY 列名 字符串类型 [CHARACTER SET 字符集名称] [COLLATE 比较规则名称];
客户端和服务器通信中的字符集
从客户端发往服务器的请求本质上就是一个字符串,服务器向客户端返回的结果本质上也是一个字符
串,而字符串其实是使用某种字符集编码的二进制数据。这个字符串可不是使用一种字符集的编码方式一成不变,从发送请求到返回结果这个过程中伴随着多次字符集的转换,在这个过程中会用到3个系统变量。
系统变量 | 描述 |
character_set_client | 服务器解码请求时使用的字符集 |
character_set_connection | 服务器处理请求时会把请求字符串从 character_set_client 转为 character_set_connection |
character_set_results | 服务器向客户端返回数据时使用的字符集 |
查看MySQL字符集配置:
show variables like 'character%'
这几个变量与客户端和服务端的交互关系:
MySQL 中从发送请求到返回结果过程里发生的各种字符集转换,非常麻烦。所以我们通常都把 character_set_client
、character_set_connection
、character_set_results
这三个系统变量设置成和客户端使用的字符集一致的情况,这样减少了很多无谓的字符集转换。
MySQL 提供了一条非常简便的语句:
SET NAMES 字符集名;
上面语句产生的效果与下面一样:
SET character_set_client = 字符集名;
SET character_set_connection = 字符集名;
SET character_set_results = 字符集名;
另外,如果想在启动客户端的时候就把 character_set_client
、 character_set_connection
、character_set_results
这三个系统变量的值设置成一样的,那我们可以在启动客户端的时候指定一个叫default-character-set
的启动选项。
在my.ini
文件配置:
default-character-set=utf8