python爬取去哪网

原创

mob649e815e6170 2023-08-10 18:35:37 ©著作权

文章标签 数据 HTTP python 文章分类 Python 后端开发

©著作权归作者所有：来自51CTO博客作者mob649e815e6170的原创作品，请联系作者获取转载授权，否则将追究法律责任

Python爬取去哪网

1. 流程图

st=>start: 开始
op1=>operation: 导入所需库
op2=>operation: 发送HTTP请求获取页面内容
op3=>operation: 使用BeautifulSoup解析页面
op4=>operation: 提取需要的数据
op5=>operation: 保存数据
e=>end: 结束

st->op1->op2->op3->op4->op5->e

2. 代码实现步骤

2.1 导入所需库

首先，我们需要导入以下库：

import requests
from bs4 import BeautifulSoup

requests 库用于发送HTTP请求，获取页面内容。
BeautifulSoup 库用于解析页面内容，提取需要的数据。

2.2 发送HTTP请求获取页面内容

接下来，我们需要发送HTTP请求，获取去哪网的页面内容。

url = "
response = requests.get(url)

url 是要访问的网页地址。
requests.get(url) 发送GET请求，获取网页内容，并将结果保存在 response 变量中。

2.3 使用BeautifulSoup解析页面

使用 BeautifulSoup 解析页面，可以方便地提取需要的数据。

soup = BeautifulSoup(response.text, "html.parser")

response.text 是HTTP响应的页面内容。
"html.parser" 是指定使用HTML解析器解析页面。

2.4 提取需要的数据

使用 BeautifulSoup 提供的方法，可以方便地提取需要的数据。

data = []
items = soup.select(".list_item")
for item in items:
    title = item.select_one(".title").text
    price = item.select_one(".price").text
    data.append({"title": title, "price": price})

soup.select(".list_item") 选择所有class为 list_item 的元素。
item.select_one(".title").text 选择class为 title 的元素，并获取其文本内容。
data.append({"title": title, "price": price}) 将提取的数据以字典的形式保存到 data 列表中。

2.5 保存数据

最后，我们可以将提取的数据保存到文件或数据库中。

import csv

with open("qunar.csv", "w", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerow(["title", "price"])  # 写入表头
    for item in data:
        writer.writerow([item["title"], item["price"]])

open("qunar.csv", "w", newline="", encoding="utf-8") 打开一个名为 qunar.csv 的文件，以写入模式操作。
csv.writer(file) 创建一个 csv 的写入对象。
writer.writerow(["title", "price"]) 写入表头。
writer.writerow([item["title"], item["price"]]) 逐行写入提取的数据。