UTF-8有点类似于Haffman编码,它将Unicode编码为:
0x00-0x7F的字符,用单个字节来表示;
0x80-0x7FF的字符用两个字节表示;
0x800-0xFFFF的字符用3字节表示;
①数字的unicode范围是:0x0030~0x0039
②英文字母的unicode范围是:
大写A到Z(属于拉丁字母):0x0041~0x005A
小写a到z(属于拉丁字母):0x0061~0x007A
③汉字的unicode范围是:0x4E00~0x9FA5
其实这个范围还包括了中,日,韩的字符
Unicode 字符编码表|汉字Unicode编码的区间为:0x4E00→0x9FA5(转)
十进制 | 十六进制 | 字符数 | 编码分类(中文) | 编码分类(英文) | ||
起始 | 终止 | 起始 | 终止 | (个) |
|
|
0 | 127 | 0000 | 007F | 128 | C0 Control and Basic Latin | |
128 | 255 | 0080 | 00FF | 128 | C1 Control and Latin 1 Supplement | |
256 | 383 | 0100 | 017F | 128 | Latin Extended-A | |
384 | 591 | 0180 | 024F | 208 | Latin Extended-B | |
592 | 687 | 0250 | 02AF | 96 | IPA Extensions | |
688 | 767 | 02B0 | 02FF | 80 | Spacing Modifiers | |
768 | 879 | 0300 | 036F | 112 | Combining Diacritics Marks | |
880 | 1023 | 0370 | 03FF | 144 | Greek and Coptic | |
1024 | 1279 | 0400 | 04FF | 256 | Cyrillic | |
1280 | 1327 | 0500 | 052F | 48 | Cyrillic Supplement | |
1328 | 1423 | 0530 | 058F | 96 | Armenian | |
1424 | 1535 | 0590 | 05FF | 112 | Hebrew | |
1536 | 1791 | 0600 | 06FF | 256 | Arabic | |
1792 | 1871 | 0700 | 074F | 80 | Syriac | |
1872 | 1919 | 0750 | 077F | 48 | Arabic Supplement | |
1920 | 1983 | 0780 | 07BF | 64 | Thaana | |
1984 | 2047 | 07C0 | 07FF | 64 | N'Ko | |
2048 | 2143 | 0800 | 085F | 96 | Avestan and Pahlavi | |
2144 | 2175 | 0860 | 087F | 32 | Mandaic | |
2176 | 2223 | 0880 | 08AF | 48 | Samaritan | |
2304 | 2431 | 0900 | 097F | 128 | Devanagari | |
2432 | 2559 | 0980 | 09FF | 128 | Bengali | |
2560 | 2687 | 0A00 | 0A7F | 128 | Gurmukhi | |
2688 | 2815 | 0A80 | 0AFF | 128 | Gujarati | |
2816 | 2943 | 0B00 | 0B7F | 128 | Oriya | |
2944 | 3071 | 0B80 | 0BFF | 128 | Tamil | |
3072 | 3199 | 0C00 | 0C7F | 128 | Telugu | |
3200 | 3327 | 0C80 | 0CFF | 128 | Kannada | |
3328 | 3455 | 0D00 | 0D7F | 128 | Malayalam | |
3456 | 3583 | 0D80 | 0DFF | 128 | Sinhala | |
3584 | 3711 | 0E00 | 0E7F | 128 | Thai | |
3712 | 3839 | 0E80 | 0EFF | 128 | Lao | |
3840 | 4095 | 0F00 | 0FFF | 256 | Tibetan | |
4096 | 4255 | 1000 | 109F | 160 | Myanmar | |
4256 | 4351 | 10A0 | 10FF | 96 | Georgian | |
4352 | 4607 | 1100 | 11FF | 256 | Hangul Jamo | |
4608 | 4991 | 1200 | 137F | 384 | Ethiopic | |
4992 | 5023 | 1380 | 139F | 32 | Ethiopic Supplement | |
5024 | 5119 | 13A0 | 13FF | 96 | Cherokee | |
5120 | 5759 | 1400 | 167F | 640 | Unified Canadian Aboriginal Syllabics | |
5760 | 5791 | 1680 | 169F | 32 | Ogham | |
5792 | 5887 | 16A0 | 16FF | 96 | Runic | |
5888 | 5919 | 1700 | 171F | 32 | Tagalog | |
5920 | 5951 | 1720 | 173F | 32 | Hanunóo | |
5952 | 5983 | 1740 | 175F | 32 | Buhid | |
5984 | 6015 | 1760 | 177F | 32 | Tagbanwa | |
6016 | 6143 | 1780 | 17FF | 128 | Khmer | |
6144 | 6319 | 1800 | 18AF | 176 | Mongolian | |
6320 | 6399 | 18B0 | 18FF | 80 | Cham | |
6400 | 6479 | 1900 | 194F | 80 | Limbu | |
6480 | 6527 | 1950 | 197F | 48 | Tai Le | |
6528 | 6623 | 1980 | 19DF | 96 | New Tai Lue | |
6624 | 6655 | 19E0 | 19FF | 32 | Kmer Symbols | |
6656 | 6687 | 1A00 | 1A1F | 32 | Buginese | |
6688 | 6751 | 1A20 | 1A5F | 64 | Batak | |
6784 | 6895 | 1A80 | 1AEF | 112 | Lanna | |
6912 | 7039 | 1B00 | 1B7F | 128 | Balinese | |
7040 | 7088 | 1B80 | 1BB0 | 49 | Sundanese | |
7104 | 7167 | 1BC0 | 1BFF | 64 | Pahawh Hmong | |
7168 | 7247 | 1C00 | 1C4F | 80 | Lepcha | |
7248 | 7295 | 1C50 | 1C7F | 48 | Ol Chiki | |
7296 | 7391 | 1C80 | 1CDF | 96 | Meithei/Manipuri | |
7424 | 7551 | 1D00 | 1D7F | 128 | Phonetic Extensions | |
7552 | 7615 | 1D80 | 1DBF | 64 | Phonetic Extensions Supplement | |
7616 | 7679 | 1DC0 | 1DFF | 64 | Combining Diacritics Marks Supplement | |
7680 | 7935 | 1E00 | 1EFF | 256 | Latin Extended Additional | |
7936 | 8191 | 1F00 | 1FFF | 256 | Greek Extended | |
8192 | 8303 | 2000 | 206F | 112 | General Punctuation | |
8304 | 8351 | 2070 | 209F | 48 | Superscripts and Subscripts | |
8352 | 8399 | 20A0 | 20CF | 48 | Currency Symbols | |
8400 | 8447 | 20D0 | 20FF | 48 | Combining Diacritics Marks for Symbols | |
8448 | 8527 | 2100 | 214F | 80 | Letterlike Symbols | |
8528 | 8591 | 2150 | 218F | 64 | Number Form | |
8592 | 8703 | 2190 | 21FF | 112 | Arrows | |
8704 | 8959 | 2200 | 22FF | 256 | Mathematical Operator | |
8960 | 9215 | 2300 | 23FF | 256 | Miscellaneous Technical | |
9216 | 9279 | 2400 | 243F | 64 | Control Pictures | |
9280 | 9311 | 2440 | 245F | 32 | Optical Character Recognition | |
9312 | 9471 | 2460 | 24FF | 160 | Enclosed Alphanumerics | |
9472 | 9599 | 2500 | 257F | 128 | Box Drawing | |
9600 | 9631 | 2580 | 259F | 32 | Block Element | |
9632 | 9727 | 25A0 | 25FF | 96 | Geometric Shapes | |
9728 | 9983 | 2600 | 26FF | 256 | Miscellaneous Symbols | |
9984 | 10175 | 2700 | 27BF | 192 | Dingbats | |
10176 | 10223 | 27C0 | 27EF | 48 | Miscellaneous Mathematical Symbols-A | |
10224 | 10239 | 27F0 | 27FF | 16 | Supplemental Arrows-A | |
10240 | 10495 | 2800 | 28FF | 256 | Braille Patterns | |
10496 | 10623 | 2900 | 297F | 128 | Supplemental Arrows-B | |
10624 | 10751 | 2980 | 29FF | 128 | Miscellaneous Mathematical Symbols-B | |
10752 | 11007 | 2A00 | 2AFF | 256 | Supplemental Mathematical Operator | |
11008 | 11263 | 2B00 | 2BFF | 256 | Miscellaneous Symbols and Arrows | |
11264 | 11359 | 2C00 | 2C5F | 96 | Glagolitic | |
11360 | 11391 | 2C60 | 2C7F | 32 | Latin Extended-C | |
11392 | 11519 | 2C80 | 2CFF | 128 | Coptic | |
11520 | 11567 | 2D00 | 2D2F | 48 | Georgian Supplement | |
11568 | 11647 | 2D30 | 2D7F | 80 | Tifinagh | |
11648 | 11743 | 2D80 | 2DDF | 96 | Ethiopic Extended | |
11776 | 11903 | 2E00 | 2E7F | 128 | Supplemental Punctuation | |
11904 | 12031 | 2E80 | 2EFF | 128 | CJK Radicals Supplement | |
12032 | 12255 | 2F00 | 2FDF | 224 | Kangxi Radicals | |
12272 | 12287 | 2FF0 | 2FFF | 16 | Ideographic Description Characters | |
12288 | 12351 | 3000 | 303F | 64 | CJK Symbols and Punctuation | |
12352 | 12447 | 3040 | 309F | 96 | Hiragana | |
12448 | 12543 | 30A0 | 30FF | 96 | Katakana | |
12544 | 12591 | 3100 | 312F | 48 | Bopomofo | |
12592 | 12687 | 3130 | 318F | 96 | Hangul Compatibility Jamo | |
12688 | 12703 | 3190 | 319F | 16 | Kanbun | |
12704 | 12735 | 31A0 | 31BF | 32 | Bopomofo Extended | |
12736 | 12783 | 31C0 | 31EF | 48 | CJK Strokes | |
12784 | 12799 | 31F0 | 31FF | 16 | Katakana Phonetic Extensions | |
12800 | 13055 | 3200 | 32FF | 256 | Enclosed CJK Letters and Months | |
13056 | 13311 | 3300 | 33FF | 256 | CJK Compatibility | |
13312 | 19903 | 3400 | 4DBF | 6592 | CJK Unified Ideographs Extension A | |
19904 | 19967 | 4DC0 | 4DFF | 64 | Yijing Hexagrams Symbols | |
19968 | 40895 | 4E00 | 9FBF | 20928 | CJK Unified Ideographs | |
40960 | 42127 | A000 | A48F | 1168 | Yi Syllables | |
42128 | 42191 | A490 | A4CF | 64 | Yi Radicals | |
42240 | 42527 | A500 | A61F | 288 | Vai | |
42592 | 42751 | A660 | A6FF | 160 | Unified Canadian Aboriginal Syllabics Supplement | |
42752 | 42783 | A700 | A71F | 32 | Modifier Tone Letters | |
42784 | 43007 | A720 | A7FF | 224 | Latin Extended-D | |
43008 | 43055 | A800 | A82F | 48 | Syloti Nagri | |
43072 | 43135 | A840 | A87F | 64 | Phags-pa | |
43136 | 43231 | A880 | A8DF | 96 | Saurashtra | |
43264 | 43391 | A900 | A97F | 128 | Javanese | |
43392 | 43487 | A980 | A9DF | 96 | Chakma | |
43520 | 43583 | AA00 | AA3F | 64 | Varang Kshiti | |
43584 | 43631 | AA40 | AA6F | 48 | Sorang Sompeng | |
43648 | 43743 | AA80 | AADF | 96 | Newari | |
43776 | 43871 | AB00 | AB5F | 96 | Vi?t Thái | |
43904 | 43936 | AB80 | ABA0 | 33 | Kayah Li | |
44032 | 55215 | AC00 | D7AF | 11184 | Hangul Syllables | |
55296 | 56319 | D800 | DBFF | 1024 | High-half zone of UTF-16 | |
56320 | 57343 | DC00 | DFFF | 1024 | Low-half zone of UTF-16 | |
57344 | 63743 | E000 | F8FF | 6400 | Private Use Zone | |
63744 | 64255 | F900 | FAFF | 512 | CJK Compatibility Ideographs | |
64256 | 64335 | FB00 | FB4F | 80 | Alphabetic Presentation Form | |
64336 | 65023 | FB50 | FDFF | 688 | Arabic Presentation Form-A | |
65024 | 65039 | FE00 | FE0F | 16 | Variation Selector | |
65040 | 65055 | FE10 | FE1F | 16 | Vertical Forms | |
65056 | 65071 | FE20 | FE2F | 16 | Combining Half Marks | |
65072 | 65103 | FE30 | FE4F | 32 | CJK Compatibility Forms | |
65104 | 65135 | FE50 | FE6F | 32 | Small Form Variants | |
65136 | 65279 | FE70 | FEFF | 144 | Arabic Presentation Form-B | |
65280 | 65519 | FF00 | FFEF | 240 | Halfwidth and Fullwidth Form | |
65520 | 65535 | FFF0 | FFFF | 16 | Specials |