只需几步小白也能写出python爬虫代码

原创

华科云商小徐 2023-03-10 10:34:15 ©著作权

文章标签 mysql 数据分析爬虫代码爬虫数据采集 文章分类 Python 后端开发

©著作权归作者所有：来自51CTO博客作者华科云商小徐的原创作品，请联系作者获取转载授权，否则将追究法律责任

关于爬虫怎么写，我们应该从最简单的商品的批量爬取说起，下面就是我写的一个简单的Python代码，看如何爬取并并存储到MySQL数据库中。

首先，需要安装必要的第三方库，如：requests，BeautifulSoup，pymysql。

pip install requests
pip install beautifulsoup4
pip install pymysql

然后，需要导入这些库，以及初始化数据库连接：

import requests
from bs4 import BeautifulSoup
import pymysql

# 连接数据库
conn = pymysql.connect(host='your_host', user='your_user', password='your_password', database='your_database', charset='utf8')
cursor = conn.cursor()

接下来，定义一个get_goods_info()函数，用于爬取单个商品信息：

def get_goods_info(goods_id):
    # 爬取商品信息
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3732.400 QQBrowser/10.5.3819.400'
    }
    url = 'https://item.jd.com/' + str(goods_id) + '.html'
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')
    

    # 获取商品名称
    name = soup.find('div', {'class': 'sku-name'}).text.strip()
    

    # 获取商品价格
    price = soup.find('strong', {'class': 'p-price'}).text.strip()


    # 获取商品评价数
    comment = soup.find('a', {'class': 'comment'}).text.strip()


    return (goods_id, name, price, comment)