python循环爬取下厨房的数据

原创

mob64ca12ebb57f 2023-11-24 08:38:19 ©著作权

文章标签 HTML html 网络请求 文章分类 Python 后端开发

©著作权归作者所有：来自51CTO博客作者mob64ca12ebb57f的原创作品，请联系作者获取转载授权，否则将追究法律责任

Python循环爬取下厨房的数据

注意：本文涉及网络爬虫，请遵守相关法律法规，尊重网站的使用规则。

前言

在互联网时代，我们可以通过各种方式获取到大量的数据。而网络爬虫是一种常见且有效的方式，可以自动化地从网页中提取所需的信息。

本文将介绍如何使用Python编写一个循环爬虫，以获取下厨房网站上的食谱数据。我们将使用Python的requests库发送网络请求，并使用BeautifulSoup库解析网页内容。

准备工作

在开始之前，需要确保已经安装了Python和以下库：

requests：用于发送HTTP请求
BeautifulSoup：用于解析HTML内容

可以通过以下命令安装这些库：

pip install requests pip install beautifulsoup4


## 网络请求

首先，我们需要发送网络请求获取网页内容。下厨房网站的食谱信息可以通过URL进行访问，因此我们只需遍历不同的URL即可获取不同的食谱数据。

```python
import requests

# 发送网络请求
response = requests.get(url)
# 获取网页内容
html = response.text

解析HTML内容

获取到网页内容后，我们需要使用BeautifulSoup库解析HTML内容，并提取所需的信息。下厨房网站的食谱信息通常包含在特定的HTML元素中，我们可以通过查看网页源代码来定位这些元素。

from bs4 import BeautifulSoup

# 解析HTML内容
soup = BeautifulSoup(html, 'html.parser')
# 提取食谱信息
recipes = soup.find_all('div', class_='recipe-item-summary')
for recipe in recipes:
    # 提取食谱标题
    title = recipe.find('p', class_='name').text
    # 提取食谱作者
    author = recipe.find('span', class_='author-name').text
    # 提取食谱步骤总数
    steps = recipe.find('span', class_='n').text
    # 提取食谱点赞数
    likes = recipe.find('span', class_='like-num').text
    # 打印食谱信息
    print(f'Title: {title}')
    print(f'Author: {author}')
    print(f'Steps: {steps}')
    print(f'Likes: {likes}')
    print('---')

循环爬取

为了获取更多的食谱数据，我们可以通过循环遍历不同的URL，并重复发送网络请求和解析HTML内容的过程。

import time

# 遍历不同的URL
for page in range(1, 11):
    url = f'
    # 发送网络请求
    response = requests.get(url)
    # 获取网页内容
    html = response.text
    # 解析HTML内容
    soup = BeautifulSoup(html, 'html.parser')
    recipes = soup.find_all('div', class_='recipe-item-summary')
    for recipe in recipes:
        # 提取食谱信息
        title = recipe.find('p', class_='name').text
        author = recipe.find('span', class_='author-name').text
        steps = recipe.find('span', class_='n').text
        likes = recipe.find('span', class_='like-num').text
        # 打印食谱信息
        print(f'Title: {title}')
        print(f'Author: {author}')
        print(f'Steps: {steps}')
        print(f'Likes: {likes}')
        print('---')
    # 休眠一段时间，避免过快访问网站
    time.sleep(1)