摘 要

随着社会经济的快速发展,城镇化的加速建设,房地产交易越来越火,尤其二手房交易市场居高不下,互联网涌现大批网上二手房交易网站,但是由于提供的房源质量参差不齐,对于个人用户的需求不够精确,无法做到房源精准投放,因此需要实现二手房房源推荐系统来解决用户需求,而房源推荐系统的实现首要就是需要获得足够多的房源信息,所以本毕设通过实现二手房数据爬取系统来爬取房源数据,为房源推荐系统提供数据支持。

本系统使用多线程多端爬虫的优势,设计一个基于Redis的分布式主题爬虫。本系统采用Scrapy爬虫框架来开发,使用Xpath网页提取技术对下载网页进行内容解析,使用Redis做分布式,使用MongoDB对提取的数据进行存储,使用Django开发可视化界面对爬取的结果进行友好展示,设计并实现了针对链家网二手房数据的分布式爬虫系统。

经过开发验证,本系统可以完成对链家二手房房源数据的分布式爬取,可以为房源推荐系统提供数据支持,也可以为数据分析师提供二手房数据分析的数据源。

关键词:二手房:分布式爬虫:Scrapy:可视化

基于Redis的分布式链家二手房房源数据爬虫系统 毕业设计_python

Title:   Design and Implementation of Second-hand housing Data crawling system

Abstract

With the rapid development of social economy, the acceleration of urbanization construction, real estate transactions become more and more fire, especially second-hand housing market is high, the Internet emerged a large number of second-hand housing transactions online website, but due to provide housing quality is uneven, demand for individual users is not accurate, can't do properties accurately targeted, so you need to realize the secondary housing system to meet the needs of users, the implementation of the first housing recommended system is need to get enough housing information, so this project through the secondary data to crawl system housing data, recommend the system to provide data support for housing.

This system uses the advantages of multi-threaded multi-layer crawlers to design a distributed topic crawler based on Redis. This system is developed by Scrapy crawler framework. XPath webpage extraction technology is used to parse the downloaded webpage, use Redis to do distributed, use Mongo to store the extracted data, and use Django to develop visual interface to display the crawling result. And realized a distributed crawler system for the second-hand housing data of the chain home network.

After development and verification, this system can complete the distributed crawling of home linking second-hand housing source data, which can provide data support for the housing recommendation system and can also provide data sources for data analysts to analyze second-hand housing data.

Keywords: Second-hand housing:Distributed crawler:Scrapy:Visualization

目录

1 引言

1.1 设计背景及概括

1.2 国内外发展现状

1.3 设计目标及设计内容

1.4 说明书的章节布局

2 相关技术简介

2.1 Robot协议对本设计的影响

2.2 爬虫

2.3 Scrapy架构

3 系统分析

3.1 业务需求分析

3.2 功能性需求分析

3.3 可行性分析

4 系统概要设计

4.1 系统逻辑层次

4.2 系统分布式设计

4.3 系统功能设计

4.4 系统数据库设计

5 系统详细设计与实现

5.1 数据爬取模块

5.2 反反爬虫模块

5.3 数据存储模块

5.4 数据可视化模块

6 系统测试

6.1 测试环境及工具

6.2 系统功能测试

7 设计总结

致谢

参考文献