一、requests 简介
requests 是一个功能强大、简单易用的 HTTP 请求库,可以使用 pip install requests
命令进行安装
下面我们将会介绍 requests 中常用的方法,详细内容请参考 官方文档
二、requests 使用
在开始讲解前,先给大家提供一个用于测试的网站,http://www.httpbin.org/
这个网站可以在页面上返回所发送 请求 的相关信息,十分适合练习使用
好了,下面正式开始!
1、get 方法
该方法用于向目标网址发送请求,接收响应
该方法返回一个 Response 对象,其常用的属性和方法列举如下:
response.url:返回请求网站的 URL
response.status_code:返回响应的状态码
response.encoding:返回响应的编码方式
response.cookies:返回响应的 Cookie 信息
response.headers:返回响应头
response.content:返回 bytes 类型的响应体
response.text:返回 str 类型的响应体,相当于
response.content.decode('utf-8')
response.json():返回 dict 类型的响应体,相当于
json.loads(response.text)
In [1]: import requests
In [2]: response = requests.get('http://www.baidu.com/')
In [3]: type(response)
Out[3]: requests.models.Response
In [4]: print(response.url) # 返回请求网站的 URL
http://www.baidu.com/
In [5]: print(response.status_code) # 返回响应的状态码
200
In [6]: print(response.encoding) # 返回响应的编码方式
ISO-8859-1
In [7]: print(response.cookies) # 返回响应的 Cookie 信息
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
In [8]: print(response.headers) # 返回响应头
{'Cache-Control': 'private, no-cache, no-store, proxy-revalidate, no-transform', 'Connection': 'keep-alive', 'Content-Encoding': 'gzip', 'Content-Type': 'text/html', 'Date': 'Wed, 11 Mar 2020 13:31:32 GMT', 'Last-Modified': 'Mon, 23 Jan 2017 13:28:12 GMT', 'Pragma': 'no-cache', 'Server': 'bfe/1.0.8.18', 'Set-Cookie': 'BDORZ=27315; max-age=86400; domain=.baidu.com; path=/', 'Transfer-Encoding': 'chunked'}
In [9]: type(response.content) # 返回 bytes 类型的响应体
Out[9]: bytes
In [10]: type(response.text) # 返回 str 类型的响应体
Out[10]: str
In [11]: type(response.json()) # 返回 dict 类型的响应体
---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
<ipython-input-11-212c006d41f3> in <module>
----> 1 type(response.json()) # 返回 dict 类型的响应体
D:\Anaconda3\lib\site-packages\requests\models.py in json(self, **kwargs)
895 # used.
896 pass
--> 897 return complexjson.loads(self.text, **kwargs)
898
899 @property
D:\Anaconda3\lib\json\__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
346 parse_int is None and parse_float is None and
347 parse_constant is None and object_pairs_hook is None and not kw):
--> 348 return _default_decoder.decode(s)
349 if cls is None:
350 cls = JSONDecoder
D:\Anaconda3\lib\json\decoder.py in decode(self, s, _w)
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()
339 if end != len(s):
D:\Anaconda3\lib\json\decoder.py in raw_decode(self, s, idx)
353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
该方法的参数说明如下:
url:必填,指定请求 URL
params:字典类型,指定请求参数,常用于发送 GET 请求时使用
In [13]: import requests
In [14]: url = 'http://www.httpbin.org/get'
In [15]: params = {
...: 'key1':'value1',
...: 'key2':'value2'
...: }
In [16]: response = requests.get(url=url,params=params)
In [17]: print(response.text)
{
"args": {
"key1": "value1",
"key2": "value2"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "www.httpbin.org",
"User-Agent": "python-requests/2.22.0",
"X-Amzn-Trace-Id": "Root=1-5e68ea33-3ac03d243a9bc10cd0fdbe70"
},
"origin": "223.104.64.154",
"url": "http://www.httpbin.org/get?key1=value1&key2=value2"
}
data:字典类型,指定表单信息,常用于发送 POST 请求时使用
注意:此时应该使用 post 方法,只需要简单的将 get 替换成 post 即可
In [19]: import requests
In [20]: url = 'http://www.httpbin.org/post'
In [21]: data = {
...: 'key1':'value1',
...: 'key2':'value2'
...: }
In [22]: response = requests.post(url=url,data=data)
In [23]: print(response.text)
{
"args": {},
"data": "",
"files": {},
"form": {
"key1": "value1",
"key2": "value2"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "23",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "www.httpbin.org",
"User-Agent": "python-requests/2.22.0",
"X-Amzn-Trace-Id": "Root=1-5e68ea91-9e230bbed78f0b32eca49538"
},
"json": null,
"origin": "223.104.64.154",
"url": "http://www.httpbin.org/post"
}
headers:字典类型,指定请求头部
In [25]: import requests
In [26]: url = 'http://www.httpbin.org/headers'
In [27]: headers = {
...: 'USER-AGENT':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396
...: .99 Safari/537.36'
...: }
In [28]: response = requests.get(url=url,headers=headers)
In [29]: print(response.text)
{
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "www.httpbin.org",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36",
"X-Amzn-Trace-Id": "Root=1-5e68eafb-6c1a36988c2da04a5e1fcd38"
}
}
proxies:字典类型,指定使用的代理
In [31]: import requests
In [32]: url = 'http://www.httpbin.org/ip'
In [33]: proxies = {
...: 'http':'10.0.8.190:18080',
...: 'http':'192.168.10.11:18080'
...: }
In [34]: response = requests.get(url=url,proxies=proxies)
cookies:字典类型,指定 Cookie
In [36]: import requests
In [37]: url = 'http://www.httpbin.org/cookies'
In [38]: cookies = {
...: 'name1':'value1',
...: 'name2':'value2'
...: }
In [39]: response = requests.get(url=url,cookies=cookies)
auth:元组类型,指定登陆时的账号和密码
In [1]: import requests
In [2]: url = 'http://www.httpbin.org/basic-auth/user/password'
In [3]: auth = ('user','password')
In [4]: response = requests.get(url=url,auth=auth)
In [5]: print(response.text)
{
"authenticated": true,
"user": "user"
}
verify:布尔类型,指定请求网站时是否需要进行证书验证,默认为 True,表示需要证书验证
假如不希望进行证书验证,则需要设置为 False
但是在这种情况下,一般会出现 Warning 提示,因为 Python 希望我们能够使用证书验证
如果不希望看到 Warning 信息,可以使用以下命令消除
requests.packages.urllib3.disable_warnings()
-
timeout:指定超时时间,若超过指定时间没有获得响应,则抛出异常
2、exceptions 模块
exceptions 是 requests 中负责异常处理的模块,包含下面常见的异常类:
-
Timeout:请求超时
-
ConnectionError:网络问题,例如 DNS 故障,拒绝连接等
-
TooManyRedirects:请求超过配置的最大重定向数
注意 :所有显式抛出的异常都继承自 requests.exceptions.RequestException
In [1]: import requests
In [2]: try:
...: response = requests.get('http://www.httpbin.org/get', timeout=0.1)
...: except requests.exceptions.RequestException as e:
...: if isinstance(e,requests.exceptions.Timeout):
...: print("Time out")
...:
Time out
【参考资料】
-
http://www.python-requests.org/en/master/