python爬虫获取起点中文网人气排行Top100(快速入门,新手必备!)

原创

大数据梦想 2022-04-01 10:20:15 博主文章分类：Python ©著作权

文章标签 python 爬虫数据 html 文章分类 代码人生

©著作权归作者所有：来自51CTO博客作者大数据梦想的原创作品，请联系作者获取转载授权，否则将追究法律责任

本篇博客小菌为大家带来的是用python爬虫获取起点中文网人气排行Top100的分享,希望大家能在学习的过程中感受爬虫的魅力!

我们先根据网址https://www.qidian.com/all/来到起点中文网的首页!

python爬虫获取起点中文网人气排行Top100(快速入门,新手必备!)_python

python爬虫获取起点中文网人气排行Top100(快速入门,新手必备!)_python_02

根据url的构成以及需要获取资源的的页数,我们可以先写出网址的列表推导式

http://a.qidian.com/?page={}'.format(str(i)) for i in range(1,6)

具体的代码见下:

"""
@File    : 获取起点中文网人气排行Top100.py
@Time    : 2019/10/21 22:31
@Author  : 封茗囧菌
@Software: PyCharm

    
"""

# 导入相关的库
import xlwt
import requests
from lxml import etree
import time

# 初始化列表,存入爬虫数据
all_info_list = []


def get_info(url):
    html = requests.get(url)
    selector = etree.HTML(html.text)

    # 定位大标签,依次循环,获取每一页的每部小说的详情链接url
    infos = selector.xpath('//ul[@class="all-img-list cf"]/li')

    # 遍历链接,获取每篇小说的详细信息
    for info in infos:
        # 标题
        title = info.xpath('div[2]/h4/a/text()')[0]
        # 作者
        author = info.xpath('div[2]/p[1]/a[1]/text()')[0]
        # 风格1
        style1 = info.xpath('div[2]/p[1]/a[2]/text()')[0]
        # 风格2
        style2 = info.xpath('div[2]/p[1]/a[3]/text()')[0]
        # 风格
        style = style1 + style2
        # 完结程度
        complete = info.xpath('div[2]/p[1]/span/text()')[0]
        # 小说介绍
        introduce = info.xpath('div[2]/p[2]/text()')[0].strip()
        
        info_list = [title, author, style, complete, introduce]
        # 把数据存入列表
        all_info_list.append(info_list)

    # 设置休眠时间
    time.sleep(1)


# 程序主入口
if __name__ == '__main__':
    urls = ['http://a.qidian.com/?page={}'.format(str(i)) for i in range(1,6)]
    for url in urls:
        get_info(url)
        time.sleep(5)
    #  定义表头
    header = ['title', 'author', 'style', 'complete', 'introduce']
    #  创建工作簿
    book = xlwt.Workbook(encoding='utf_8')
    #  创建工作表
    sheet = book.add_sheet('Shee1')

    #  python range() 函数可创建一个整数列表，一般用在 for 循环中。
    #  Python len() 方法返回对象（字符、列表、元组等）长度或项目个数。
    for h in range(len(header)):
        #   写入表头
        sheet.write(0, h, header[h])

    i = 1
    #  通过循环遍历,把数据存放入xls表格中
    for list in all_info_list:
        j = 0
        for data in list:
            sheet.write(i, j, data)
            # 查看结果
            print(data)

            j += 1
        i += 1
    #  数据存储完毕,把工作簿保存到本地路径
    book.save('qidianxiaoshuo.xls')