爬取拉勾网招聘信息

原创

Richie_LL 2022-11-24 00:40:28 博主文章分类：Python学习 ©著作权

©著作权归作者所有：来自51CTO博客作者Richie_LL的原创作品，请联系作者获取转载授权，否则将追究法律责任

爬取拉勾网感兴趣的招聘信息，并用自己的抠脚技术分析下？
爬取目标：
+ 爬取拉勾网上自己感兴趣的职位
+ 获取每个职位的大致信息
+ 凭想象分析爬取结果（MySql+Excel）

0、爬取效果：

1、分析

爬取的目标url为：’https://www.lagou.com/zhaopin/‘+ language + ‘/’+ pageIndex +’/?filterOption=’+pageIndex，这次是除了多页还有多分类的爬取，于是我想了一个办法，将想要爬去的分类存到一个txt中，然后读入文件存成一个列表

#加载所需要爬取的岗位
def load_position():
    f = open('position.txt','r',encoding='utf-8')
    global position_d
    for line in f.readlines():
        position.append(line.strip('\n').split(' ')[0].strip())
        position1.append(line.strip('\n').split(' ')[1].strip())
    f.close()
    position_d = dict(zip(position,position1))

def load_info(bsObj):
    job = []                    #存储当页的每个职位的信息
    pageCode = bsObj.findAll("li", {"class": "con_list_item default_list"})
    for i in pageCode:
        position = i.h2.get_text()          #职位
        address = i.find('span',{'class':'add'}).get_text()     #工作地址
        tmp = i.find("div", {"class": "li_b_l"})
        salary = tmp.find("span", {"class": "money"}).get_text()    #工资
        re_education = re.compile(r'.*?<!--<i></i>-->(.*?)</div>', re.S)
        education = re.findall(re_education, str(tmp))[0].strip()       
        company = i.find("div", {"class": "company_name"}).get_text().strip()   #公司
        detail = 'http:' + (i.find("div", {"class": "p_top"}).a)['href']    #链接
        experience = education.split('/')[0].strip()                    #经验要求
        edu = education.split('/')[1].strip()                           #学历要求
        #print(position + '\t' + address + '\t' + salary + '\t' + education + '\t' + company + '\t' + detail)
        job.append([position,address,salary,experience,edu,company,detail])
    return job

2、将爬取的内容存入excel中

def load_xls(job,career,cnt):
    file = xlwt.Workbook()
    sheet = file.add_sheet(position_d[career],cell_overwrite_ok=True)
    col = (u'职位',u'工作地点',u'工资',u'工作经验',u'学历要求',u'公司名称',u'详细信息链接')
    for i in range(0,7):
        sheet.write(0,i,col[i])
    tt = 1
    for i in range(0,int(cnt)):
        data = job[i]
        for j in range(len(data)):
            tmp = data[j]
            for k in range(0,7):
                sheet.write(tt,k,tmp[k])
            tt += 1
    file.save('jobInfo/'+position_d[career]+'.xls')

3、分析爬取到的数据

看到爬下来的内容，真是倍有成就感，然鹅现在只是单纯的爬数据我已经不满足了，看着那些大佬们的各种分析，不仅想着我也要装（数）一（据）个（分）逼（析），至于分析过程我觉得有点low，以至于难以启齿，也就是把所有的数据导到MySQL里，然后xjb查询一些想要的，在弄回Excel里，做图表啊什么的，反正就是瞎折腾，不过感觉好像还不错，我也就展示展示成果吧