What's the difference between encoding and charset?

转载

mb5fd86ddc9c8d5 2021-04-09 17:50:00

I am confused about the text encoding and charset. For many reasons, I have to learn non-Unicode, non-UTF8 stuff in my upcoming work.

I find the word "charset" in email headers as in "ISO-2022-JP", but there's no such a encoding in text editors. (I looked around the different text editors.)

What's the difference between text encoding and charset? I'd appreciate it if you could show me some use case examples.

回答:

Basically:

charset is the set of characters you can use
encoding is the way these characters are stored into memory

回答2

Every encoding has a particular charset associated with it, but there can be more than one encoding for a given charset. A charset is simply what it sounds like, a set of characters. There are a large number of charsets, including many that are intended for particular scripts or languages.

However, we are well along the way in the transition to Unicode, which includes a character set capable of representing almost all the world's scripts. However, there are multiple encodings for Unicode. An encoding is a way of mapping a string of characters to a string of bytes. Examples of Unicode encodings include UTF-8, UTF-16 BE, and UTF-16 LE . Each of these has advantages for particular applications or machine architectures.