python中字符串有中文乱码 python 字符串乱码

转载

mob6454cc78b025 2023-06-17 19:25:11

文章标签 python中字符串有中文乱码 python 乱码 ico Python 文章分类 Python 后端开发

Python编码原理

和Java类似， Python内部也采用Unicode编码方式来实现。在Python中str 和 unicode都是basestring的子类。Unicode又称万国码，它采用统一的一套字符集编码所有国家的文字， str可以理解为是unicode字符通过特定字符集编码后的结果，如常见的utf-8、gbk、gb2312、gb18030等

Python编码转换

了解了上面的知识后，我们看Python中的编码转换。相信你也曾经饱受各种中文乱码的困扰吧^_^ 那我们先来看看乱码是怎么产生的？
以Java和Python为例，乱码产生的原因往往是编码成字符串或者解码成Unicode时，指定的编码不一致造成的。比如下面的例子：

str = u'你好'
    str = str.encode('utf-8')
    print str.decode('gbk')

在Python中str 和 unicode对象都提供了encode函数和decode函数，通过这两个函数我们可以很方便的完成各种编码转换的工作。

废话不多说，直接看例子：

例一：

str = u'张三'
    print 'type: %s, value: %s' % (type(str), str)

    str = str.encode('utf-8')
    print 'type: %s, value: %s' % (type(str), str)

    str = str.decode('gbk')
    print 'type: %s, value: %s' % (type(str), str)

输出结果：

type: <type 'unicode'>, value: 张三
type: <type 'str'>, value: 张三
type: <type 'unicode'>, value: 张三

例二：

str = '张三'  #我文件编码是utf-8
    print 'type: %s, value: %s' % (type(str), str)

    str = str.decode('gbk')  # 按gbk还原成unicode， 肯定是乱码
    print 'type: %s, value: %s' % (type(str), str)

    str = str.encode('gbk')  # 先将编码还原成UTF-8编码
    print 'type: %s, value: %s' % (type(str), str)

    str = str.decode('utf-8') # decode成正确的unicode
    print 'type: %s, value: %s' % (type(str), str)

输出结果：

type: <type 'str'>, value: 张三
type: <type 'unicode'>, value: 寮犱笁
type: <type 'str'>, value: 张三
type: <type 'unicode'>, value: 张三

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：Java object函数 java的object

下一篇：python 解析以太网协议 python解析ip地址

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

python中字符串有中文乱码 python 字符串乱码

python中字符串有中文乱码 python 字符串乱码

Python编码原理

Python编码转换

51CTO博客