python3 爬取网页数据 python爬取网页数据步骤图解

转载

Aceryt 2023-05-31 09:12:34

文章标签 python3 爬取网页数据 python 数据 html 持久化存储 文章分类 Python 后端开发

初学人，尝试爬取百度引擎。

打开百度

python3 爬取网页数据 python爬取网页数据步骤图解_持久化存储

谷歌浏览器下点击右键打开检查工具

python3 爬取网页数据 python爬取网页数据步骤图解_持久化存储_02

点击第三行的ALL

python3 爬取网页数据 python爬取网页数据步骤图解_python3 爬取网页数据_03

可以看到右边的状态栏发生了变化，向上划，找到第一个文件，点击查看。

python3 爬取网页数据 python爬取网页数据步骤图解_python_04

查看之后可以看到我们想要的数据。

所需的url以及request method方式为get方式。

python3 爬取网页数据 python爬取网页数据步骤图解_python3 爬取网页数据_05

以及得知content-type为text：

python3 爬取网页数据 python爬取网页数据步骤图解_html_06

翻到最底，获知user-agent（这个可以逮住一个使劲薅）

python3 爬取网页数据 python爬取网页数据步骤图解_持久化存储_07

需要的数据差不多都齐了，接下来开始怼代码：

首先导入requests包。

import  requests

第一步
指定好URL：

if __name__ == '__main__':
    #step1:指定url
    url = 'https://www.baidu.com/'

第二步
发起请求

#step2:发起请求
response =  requests.get(url=url)

第三步
获取到相应数据

#step3:获取相应数据,text返回的是以字符串形式地响应数据
wenben = response.text
print(wenben)

第四步
持久化存储

#step4:持久化存储
with open('./baidu.html','w',encoding='utf-8') as fp:
    fp.write(wenben)
print("爬取结束")

源代码：

import  requests
if __name__ == '__main__':
    #step1:指定url
    url = 'https://www.baidu.com/'
    #step2:发起请求
    response =  requests.get(url=url)
    #step3:获取相应数据,text返回的是以字符串形式地响应数据
    wenben = response.text
    print(wenben)
    #step4:持久化存储
    with open('./baidu.html','w',encoding='utf-8') as fp:
        fp.write(wenben)
    print("爬取结束")

本地运行代码，会发现同目录下出现一个名为baidu.html的文件。

点开运行。

python3 爬取网页数据 python爬取网页数据步骤图解_python_08