scrapy将爬取的招聘信息存入MySQL

原创

SmallSweets 2021-04-14 10:54:36 ©著作权

文章标签 招聘信息存入MySQL 文章分类 MySQL 数据库

©著作权归作者所有：来自51CTO博客作者SmallSweets的原创作品，请联系作者获取转载授权，否则将追究法律责任

写这篇文章的目的就是为自己的学习做一下笔记，记录一下python如何连接和操作MySQL。

既然要用python操作MySQL，就要导入操作MySQL的模块，python3.0+要用pymysql这个模块。

爬取网址：https://www.liepin.com/

scrapy将爬取的招聘信息存入MySQL_招聘信息存入MySQL

因为只是记录python操作MySQL，所以就简单的选择了网址首页的几条信息(招聘公司，招聘岗位，月薪，位置)来作为写入数据库的内容，插入数据量大的信息所用方法是相同的。

代码部分

爬虫部分

import scrapyfrom ..items import ConnectMysqlItem  # 根据自己的文件名调入相关模块class CmysqlSpider(scrapy.Spider):name = 'cmysql'allowed_domains = ['liepin.com']start_urls = ['https://www.liepin.com/']def parse(self, response):item = ConnectMysqlItem()zhiwei_list = response.xpath("//div[@id='LPAdServer-23310']/ul/li/div/p[@class='job-title']/a/text()").extract()  # 获取相关招聘信息gongsi_list = response.xpath("//div[@id='LPAdServer-23310']/ul/li/div[@class='job-detail']/p[@class='company-name']/a/text()").extract()didian_list = response.xpath("//div[@id='LPAdServer-23310']/ul/li/div/p[@class='job-salary']/em/span/text()").extract()  # 获得的信息为列表格式gongzi_list = response.xpath("//div[@id='LPAdServer-23310']/ul/li/div/p[@class='job-salary']/em/text()").extract()for i in range(len(zhiwei_list)):zhiwei = zhiwei_list[i]gongzi = gongzi_list[i]didian = didian_list[i]gongsi = gongsi_list[i]item["职位"] = zhiwei
            item["工资"] = gongzi
            item["地点"] = didian
            item["公司"] = gongsiyield itempass

items部分

import scrapyclass ConnectMysqlItem(scrapy.Item):# define the fields for your item here like:# name = scrapy.Field()工资 = scrapy.Field()公司 = scrapy.Field()地点 = scrapy.Field()职位 = scrapy.Field()pass

settings部分
(都有相关配置信息，删掉注释符就可以)

ROBOTSTXT_OBEY = FalseDEFAULT_REQUEST_HEADERS = {
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
  'Accept-Language': 'en',
  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'}
 ITEM_PIPELINES = {
   'connect_mysql.pipelines.ConnectMysqlPipeline': 300,}

pipelines部分

import pymysqlclass ConnectMysqlPipeline:def process_item(self, item, spider):item["职位"] = item.get("职位"),item["地点"] = item.get('地点'),item["公司"] = item.get('公司'),item["工资"] = item.get('工资'),# 打开数据库连接db = pymysql.connect("localhost", "root", "root", "connect_mysql")# 使用cursor()方法获取操作游标cursor = db.cursor()# SQL 插入语句sql = """INSERT INTO work_table(职位,
                 公司, 地点, 工资)
                 VALUES ('{}', '{}','{}' , '{}')""".format(item["职位"][0],item['地点'][0],item['公司'][0],item['工资'][0])try:# 执行sql语句cursor.execute(sql)# 提交到数据库执行db.commit()except:# 如果发生错误则回滚db.rollback()# 关闭数据库连接db.close()

连接MySQL数据库和写入信息的命令都在pipelines部分中，这里说一下操作MySQL的命令：

连接数据库		pymysql.connect("数据库地址(本地就是localhost)", "用户名", "密码", "数据库名称")获取游标对象		cursor = db.cursor()所要执行的SQL语句		"""INSERT INTO 表名(列名，列名)
                       VALUES ("插入的数据"，"插入的数据")"""执行SQL语句		cursor.execute(sq语句对象名)提交到数据库执行		db.commit()关闭数据库连接		db.close()

执行结果

scrapy将爬取的招聘信息存入MySQL_招聘信息存入MySQL_02