python 文字编码 python字符编码

转载

互联网小墨风 2023-06-19 13:22:40

文章标签 python 文字编码字符串 ico ide 文章分类 Python 后端开发

在Python2中，普通字符串是以8位ASCII码进行存储的，而Unicode字符串则存储为16位unicode字符串，这样能够表示更多的字符集。使用的语法是在字符串前面加上前缀 u。

在Python3中，所有的字符串都是Unicode字符串。

1.字符串编码

	encode(encoding='UTF-8',errors='strict')

2.字符串解码

bytes.decode(encoding="utf-8", errors="strict") Python3 中没有 decode 方法，但我们可以使用 bytes 对象的 decode() 方法来解码给定的 bytes 对象，这个 bytes 对象可以由 str.encode() 来编码返回。

3.获取字符串编码方式

import chardet

字典=chardet.detect(encode编码返回的对象)
'encoding'对应的编码方式， 'confidence'正确的概率

{'encoding': 'Windows-1252', 'confidence': 0.45127272727272727, 'language': ''}

text='中华人民共和国'
# 字符串编码,编码为bytes类型
text_utf=text.encode("utf-8")
text_gbk=text.encode('gbk')
print('utf',type(text_utf),text_utf)
#utf <class 'bytes'> b'\xe4\xb8\xad\xe5\x8d\x8e\xe4\xba\xba\xe6\xb0\x91\xe5\x85\xb1\xe5\x92\x8c\xe5\x9b\xbd'
print('gbk',type(text_gbk),text_gbk)
#gbk <class 'bytes'> b'\xd6\xd0\xbb\xaa\xc8\xcb\xc3\xf1\xb9\xb2\xba\xcd\xb9\xfa'

# 将bytes类型，解码为string类型
# 方法1.decode()方法
text_utf_decode=text_utf.decode(encoding='utf-8')
text_gbk_decode=text_gbk.decode(encoding='gbk')
print('utf',type(text_utf_decode),text_utf_decode)
#utf <class 'str'> 中华人民共和国
print('gbk',type(text_gbk_decode),text_gbk_decode)
#gbk <class 'str'> 中华人民共和国
# 方法2.str()方法
text_utf_dec=str(text_utf,encoding='utf-8')
print('utf',type(text_utf_dec),text_utf_dec)
#utf <class 'str'> 中华人民共和国

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。