爬虫（七十七）requests的基本使用

转载

人生代码_公众号 2021-07-08 09:19:36

文章标签 Python 编程编程语言爬虫 文章分类 Python 后端开发

一、requests 简介

requests 是一个功能强大、简单易用的 HTTP 请求库，可以使用 pip install requests 命令进行安装

下面我们将会介绍 requests 中常用的方法，详细内容请参考官方文档

二、requests 使用

在开始讲解前，先给大家提供一个用于测试的网站，http://www.httpbin.org/

这个网站可以在页面上返回所发送请求的相关信息，十分适合练习使用

好了，下面正式开始！

1、get 方法

该方法用于向目标网址发送请求，接收响应

该方法返回一个 Response 对象，其常用的属性和方法列举如下：

response.url：返回请求网站的 URL

response.status_code：返回响应的状态码

response.encoding：返回响应的编码方式

response.cookies：返回响应的 Cookie 信息

response.headers：返回响应头

response.content：返回 bytes 类型的响应体

response.text：返回 str 类型的响应体，相当于

response.content.decode('utf-8')

response.json()：返回 dict 类型的响应体，相当于

json.loads(response.text)

In [1]: import requests

In [2]: response = requests.get('http://www.baidu.com/')

In [3]: type(response)
Out[3]: requests.models.Response

In [4]: print(response.url) # 返回请求网站的 URL
http://www.baidu.com/

In [5]: print(response.status_code) # 返回响应的状态码
200

In [6]: print(response.encoding) # 返回响应的编码方式
ISO-8859-1

In [7]: print(response.cookies) # 返回响应的 Cookie 信息
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>

In [8]: print(response.headers) # 返回响应头
{'Cache-Control': 'private, no-cache, no-store, proxy-revalidate, no-transform', 'Connection': 'keep-alive', 'Content-Encoding': 'gzip', 'Content-Type': 'text/html', 'Date': 'Wed, 11 Mar 2020 13:31:32 GMT', 'Last-Modified': 'Mon, 23 Jan 2017 13:28:12 GMT', 'Pragma': 'no-cache', 'Server': 'bfe/1.0.8.18', 'Set-Cookie': 'BDORZ=27315; max-age=86400; domain=.baidu.com; path=/', 'Transfer-Encoding': 'chunked'}

In [9]: type(response.content) # 返回 bytes 类型的响应体
Out[9]: bytes

In [10]: type(response.text) # 返回 str 类型的响应体
Out[10]: str

In [11]: type(response.json()) # 返回 dict 类型的响应体
---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-11-212c006d41f3> in <module>
----> 1 type(response.json()) # 返回 dict 类型的响应体

D:\Anaconda3\lib\site-packages\requests\models.py in json(self, **kwargs)
    895                     # used.
    896                     pass
--> 897         return complexjson.loads(self.text, **kwargs)
    898
    899     @property

D:\Anaconda3\lib\json\__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    346             parse_int is None and parse_float is None and
    347             parse_constant is None and object_pairs_hook is None and not kw):
--> 348         return _default_decoder.decode(s)
    349     if cls is None:
    350         cls = JSONDecoder

D:\Anaconda3\lib\json\decoder.py in decode(self, s, _w)
    335
    336         """
--> 337         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338         end = _w(s, end).end()
    339         if end != len(s):

D:\Anaconda3\lib\json\decoder.py in raw_decode(self, s, idx)
    353             obj, end = self.scan_once(s, idx)
    354         except StopIteration as err:
--> 355             raise JSONDecodeError("Expecting value", s, err.value) from None
    356         return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

该方法的参数说明如下：

url：必填，指定请求 URL

params：字典类型，指定请求参数，常用于发送 GET 请求时使用

In [13]: import requests

In [14]: url = 'http://www.httpbin.org/get'

In [15]: params = {
    ...:     'key1':'value1',
    ...:     'key2':'value2'
    ...: }

In [16]: response = requests.get(url=url,params=params)

In [17]: print(response.text)
{
  "args": {
    "key1": "value1",
    "key2": "value2"
  },
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Host": "www.httpbin.org",
    "User-Agent": "python-requests/2.22.0",
    "X-Amzn-Trace-Id": "Root=1-5e68ea33-3ac03d243a9bc10cd0fdbe70"
  },
  "origin": "223.104.64.154",
  "url": "http://www.httpbin.org/get?key1=value1&key2=value2"
}

data：字典类型，指定表单信息，常用于发送 POST 请求时使用

注意：此时应该使用 post 方法，只需要简单的将 get 替换成 post 即可

In [19]: import requests

In [20]: url = 'http://www.httpbin.org/post'

In [21]: data = {
    ...:     'key1':'value1',
    ...:     'key2':'value2'
    ...: }

In [22]: response = requests.post(url=url,data=data)

In [23]: print(response.text)
{
  "args": {},
  "data": "",
  "files": {},
  "form": {
    "key1": "value1",
    "key2": "value2"
  },
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Content-Length": "23",
    "Content-Type": "application/x-www-form-urlencoded",
    "Host": "www.httpbin.org",
    "User-Agent": "python-requests/2.22.0",
    "X-Amzn-Trace-Id": "Root=1-5e68ea91-9e230bbed78f0b32eca49538"
  },
  "json": null,
  "origin": "223.104.64.154",
  "url": "http://www.httpbin.org/post"
}

爬虫（七十七）requests的基本使用_Python

headers：字典类型，指定请求头部

In [25]: import requests

In [26]: url = 'http://www.httpbin.org/headers'

In [27]: headers = {
    ...:     'USER-AGENT':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396
    ...: .99 Safari/537.36'
    ...: }

In [28]: response = requests.get(url=url,headers=headers)

In [29]: print(response.text)
{
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Host": "www.httpbin.org",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36",
    "X-Amzn-Trace-Id": "Root=1-5e68eafb-6c1a36988c2da04a5e1fcd38"
  }
}

爬虫（七十七）requests的基本使用_Python_02

proxies：字典类型，指定使用的代理

In [31]: import requests

In [32]: url = 'http://www.httpbin.org/ip'

In [33]: proxies = {
    ...:     'http':'10.0.8.190:18080',
    ...:     'http':'192.168.10.11:18080'
    ...: }

In [34]: response = requests.get(url=url,proxies=proxies)

cookies：字典类型，指定 Cookie

In [36]: import requests

In [37]: url = 'http://www.httpbin.org/cookies'

In [38]: cookies = {
    ...:     'name1':'value1',
    ...:     'name2':'value2'
    ...: }

In [39]: response = requests.get(url=url,cookies=cookies)

auth：元组类型，指定登陆时的账号和密码

In [1]: import requests

In [2]: url = 'http://www.httpbin.org/basic-auth/user/password'

In [3]: auth = ('user','password')

In [4]: response = requests.get(url=url,auth=auth)

In [5]: print(response.text)
{
  "authenticated": true,
  "user": "user"
}

verify：布尔类型，指定请求网站时是否需要进行证书验证，默认为 True，表示需要证书验证

假如不希望进行证书验证，则需要设置为 False

爬虫（七十七）requests的基本使用_Python_03

但是在这种情况下，一般会出现 Warning 提示，因为 Python 希望我们能够使用证书验证

如果不希望看到 Warning 信息，可以使用以下命令消除

requests.packages.urllib3.disable_warnings()

timeout：指定超时时间，若超过指定时间没有获得响应，则抛出异常

2、exceptions 模块

exceptions 是 requests 中负责异常处理的模块，包含下面常见的异常类：

Timeout：请求超时
ConnectionError：网络问题，例如 DNS 故障，拒绝连接等
TooManyRedirects：请求超过配置的最大重定向数

注意：所有显式抛出的异常都继承自 requests.exceptions.RequestException

In [1]: import requests

In [2]: try:
   ...:         response = requests.get('http://www.httpbin.org/get', timeout=0.1)
   ...: except requests.exceptions.RequestException as e:
   ...:         if isinstance(e,requests.exceptions.Timeout):
   ...:             print("Time out")
   ...:
Time out

【参考资料】