java设置json和字符编码 json字符串设置编码

转载

梦想启航吧 2023-06-07 19:33:19

文章标签 java设置json和字符编码 python json 字符串编程语言 文章分类 Java 后端开发

Json基础操作

编码与解码

json.dumps(dict)将Python对象编码为Json格式数据（字典转换为Json对象）

>>> import json
>>> dicta = {'Kizuner':True,'DD':False}
>>> jsonstr = json.dumps(dicta)
>>> jsonstr
'{"Kizuner": true, "DD": false}'
>>> type(jsonstr)
<class 'str'> # json格式的数据类型是字符串，因此可以进行字符串处理而不能作为字典操作

Python 编码为 JSON 类型转换对应表：

Python	JSON
dict	object
str	string
list, tuple	array
int, float	number
True/False	true/false
None	null

json.loads(jsonobj)以相应的类型（正常Json对象即为字典）载入（解码）JSON格式

>>> dictb = json.loads(jsonstr)
>>> dictb
{'Kizuner': True, 'DD': False}
>>> type(dictb)
<class 'dict'>

JSON解码为Python类型对应转换表：

JSON	Python
object	dict
array	list
string	str
number (int)	int
number (real)	float
true/false	True/False
null	None

字符编码

字符编码概述

一般编码即将字符转化为二进制数字流的编码方式，广泛应用于各种符号和文字的输入、存储、交换和显示，包括英文字符的转换（ASCII）、各种汉字编码（如GBK）以及日文、西欧文字等的编码方式。为了减少庞杂的编码方式的混乱，需要一种统一的标准编码，即Unicode，之后为了节约存储改进为UTF（通用转换格式 Unicode Transformation Format），最常用的为UTF-8。这里不详述各种编码方式的具体原理，只讨论Python中的字符编码。

Python字符编码

Python的字符编码有str和bytes两种类型，Python3默认为str类型（这很重要，网络上许多内容都是很久以前基于Python2的，不适用于现在的情况）。两种编码类型的区别在于，str是Unicode编码，而bytes采用不同于Unicode的编码方式（将Unicode转换为不同的编码）如UTF-8，GBK。

查看编码类型

type(string)方法直接查看字符串类型（str和bytes，如果是bytes则无法进一步确定编码方式，type方法本身也可以查看其它数据类型）

>>> string = 'hello!'
>>> type(string)
<class 'str'> # 默认str类型--Unicode编码

chardet.detect(byte_str)方法可以查看bytes类型字符串的具体编码方式，返回值为包含置信度confidence和编码方式的字典

>>> import chardet
>>> str_utf = string.encode('utf-8') # 对Unicode重编码，后面会讲
>>> str_utf
b'hello!' # 注意特殊之处，字符串前有b
>>> type(str_utf)
<class 'bytes'>
>>> chardet.detect(str_utf)
{'encoding': 'ascii', 'confidence': 1.0, 'language': ''}

注意：chardet只能接受bytes类型的字符串，否则会报错TypeError（str型字符串）

>>> chardet.detect(string)
Traceback (most recent call last):
  File "<pyshell#48>", line 1, in <module>
    chardet.detect(string)
  File "C:\Users\Hello\AppData\Local\Programs\Python\Python38\lib\site-packages\chardet\__init__.py", line 33, in detect
    raise TypeError('Expected object of type bytes or bytearray, got: '
TypeError: Expected object of type bytes or bytearray, got: <class 'str'>

推荐先用type确认是否是bytes类型，若是则用chardet锁定编码方式。

编码与解码

str.encode(coding)编码（再编码）与str.decode(coding)解码

>>> str_decode = str_utf.decode('utf-8')
>>> type(str_decode)
<class 'str'>

注意事项

已编码对象无法再编码，必须解码后编码；Unicode在Python3中无法解码，否则均会报错AttributeError

>>> str_utf.encode('gbk')
Traceback (most recent call last):
  File "<pyshell#70>", line 1, in <module>
    str_utf.encode('gbk')
AttributeError: 'bytes' object has no attribute 'encode'
>>> str_utf.decode('utf-8').encode('gbk') # 实现编码的转换
b'hello!'

>>> string.decode()
Traceback (most recent call last):
  File "<pyshell#72>", line 1, in <module>
    string.decode()
AttributeError: 'str' object has no attribute 'decode'

编码与解码的对象是str或bytes型字符串，否则会报错AttributeError（如字典）

>>> a_dict = {}
>>> a_dict.encode('gbk')
Traceback (most recent call last):
  File "<pyshell#57>", line 1, in <module>
    a_dict.encode('gbk')
AttributeError: 'dict' object has no attribute 'encode'

爬虫编码

其实也不必特意考虑编解码，爬虫的requests库直接指定网页编码r.encoding(coding)即可，不必下载网页后再解码

>>> r = requests.get('http://www.baidu.com')
>>> r.encoding # 查看下载网页的编码方式
'ISO-8859-1'
>>> r.encoding = 'utf-8' # 按照 utf-8 对网页解码

字符转义

即特殊字符的表示（需要配合print函数完成）

原理很简单，关键在于具体的转义字符的输出

转义字符	输出
’	’
"	"
\a	发出系统响铃声 ‘bi’响一声
\b	退格
\f	换页（在打印时）
\n	回车，光标在下一行
\r	换行，光标在上一行
\t	八个空格
\\	\
\	输入换行符
\t	横向制表符
\v	纵向制表符
\oyy	八进制数，yy代表的字符，例如：\o12代表换行
\xyy	十六进制数，yy代表的字符，例如：\x0a代表换行
\000	终止符，\000后的字符串全部忽略

代码示例：

>>> print (u"你好吗？\r朋友")
朋友吗？

\r 可以实现许多优秀功能，详见 \r 高级应用

注意：格式化是输出形式，转义是特殊字符的表示，两者并不矛盾，可以同时存在

>>> name = 'Snake'
>>> 'Hello!%s!\aHow are you going?\n'%name
'Hello!Snake!\x07How are you going?\n'
>>> print('Hello!%s!\aHow are you going?\n'%name) # 同时也可以看出print函数的重要作用
Hello!Snake!How are you going?

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。