python制作课程表 python课表查询系统

转载

mob64ca1414098d 2024-01-05 23:20:03

文章标签 python制作课程表 python课表查询系统 sql mysql python 文章分类 Python 后端开发

因学校教务系统课程表查询功能累赘，服务器经常挂，同时也不适合手机端查询，所以用python开发爬虫抓取所有课程表，放到我的服务器上面。本文仅供学习。

特性

中途退出程序再次运行不会抓取到重复课程表

使用MySQL数据库储存数据，方便管理课程表

注意

请勿复制粘贴本文章代码，因为python对于代码缩进要求非常严格。

请到我的Gihub上面下载源代码 (附班级课表数据库文件)：

web端查询课程表

操作环境：

Windows 10 专业版

Python 3.5.1

WampSever 2.5

WampSever 安装

储存课程表需要用到Mysql数据库，所以要下载Wampsever。

进入网站，往下拉，找到如图所示的位置

根据你系统选择安装32位或者64位的，如果你不知道你系统是多少位，下载32位的即可。

安装完毕后，运行它。

数据库表的导入

运行wampsever后，在桌面右下角状态栏中，鼠标左键点击wampserver图标，如下图，点击phpMyAdmin，进入web版数据库管理页面。

进入到web界面后，左上角处点击new，新建名字为python的数据库

按照图中设置。

选择sql文件导入数据库。

Python 安装

点击Download Python 3.5.1 开始下载，

安装时记得勾选Add python.exe to Path

Python插件安装

安装Python 完毕后，需要安装下列插件。

以管理员模式打开CMD命令提示符，执行命令

pip install beautifulsoup4

PyMySQL 数据库连接插件

以管理员模式打开CMD命令提示符，执行命令

pip install PyMySQL

原理分析

教务管理信息系统课程表链接分析

从链接的尾部*** .aspx?xsbh=14040801&xq=15-16-2 中的问号?可以看出该网页是用GET的方式请求数据的，第一个参数是xsbh = 14040801，第二个参数是xq = 15-16-2，xq明显是学期拼音首字母缩写，而xsbh应该就是班级编号了。所以我们要先获取所有的班级编号和学期数，然后拼接成班级课表链接，进行遍历抓取课程表：http://202.193.177.7:8081/(S(oaml0r55uwfphs55ssjh43mh))/web_jxrw/cx_kb_bjkb_bj.aspx?xsbh=班级编号&xq=学期

抓取所有学期

打开链接后，查看网页源代码，所有学期数都是放在ID为DDxq的select标签下的option标签(如图)，

然后出现下图，

因为在前端开发中，id这个属性的值是唯一的，所以使用Beautiful Soup选择此处的代码为soup.find(id="DDxq").find_all("option")

这就可以用beautifulsoup插件提取所有学期。

抓取所有学期python代码

spiderForSemester.py
Python
import urllib
import urllib.request
from bs4 import BeautifulSoup
import pymysql.cursors
connection = pymysql.connect(
host='localhost',
user='root',
password='',
db='python',
charset="utf8",
cursorclass=pymysql.cursors.DictCursor)
def sqlLink(semester):
try:
with connection.cursor() as cursor:
sql = "INSERT INTO `semester` (`semester`) VALUES (%s)"
cursor.execute(sql, (semester))
connection.commit()
finally:
# connection.close()
pass
return
response = urllib.request.urlopen('http://172.16.129.117/web_jxrw/cx_kb_bjxzall.aspx')
if response.code == 200:
print ("succeed",response.code)
html = response.read()
soup = BeautifulSoup(
html,
'html.parser',
from_encoding = 'gb2312')
arr = soup.find(id="DDxq").find_all("option")
for x in arr:
ch = x.get_text()
print(ch)
sqlLink(str(ch))
else:
print ("failed")

importurllib
importurllib.request
frombs4importBeautifulSoup
importpymysql.cursors
connection=pymysql.connect(
host='localhost',
user='root',
password='',
db='python',
charset="utf8",
cursorclass=pymysql.cursors.DictCursor)
defsqlLink(semester):
try:
withconnection.cursor()ascursor:
sql="INSERT INTO `semester` (`semester`) VALUES (%s)"
cursor.execute(sql,(semester))
connection.commit()
finally:
# connection.close()
pass
return
response=urllib.request.urlopen('http://172.16.129.117/web_jxrw/cx_kb_bjxzall.aspx')
ifresponse.code==200:
print("succeed",response.code)
html=response.read()
soup=BeautifulSoup(
html,
'html.parser',
from_encoding='gb2312')
arr=soup.find(id="DDxq").find_all("option")
forxinarr:
ch=x.get_text()
print(ch)
sqlLink(str(ch))
else:
print("failed")

抓取所有班级和班级编号

同样查看教务系统查询课表页面源代码，所有班级都放在ID为Cxbj_all1_bjlist1的select标签下(如图)，直接用beautifulsoup抓取。

图中的option标签value属性就是班级编号，text内容就是班级名字。同样使用beautifulsoup，代码：soup.find(id="Cxbj_all1_bjlist1").find_all("option")

抓取所有班级python代码

Python
import urllib
import urllib.request
from bs4 import BeautifulSoup
import pymysql.cursors
connection = pymysql.connect(
host='localhost',
user='root',
password='',
db='python',
charset="utf8",
cursorclass=pymysql.cursors.DictCursor)
def sqlLink(clssName,classCode,grade):
try:
with connection.cursor() as cursor:
sql = "INSERT INTO `class` (`className`,`classCode`,`grade`) VALUES (%s,%s,%s)"
cursor.execute(sql, (clssName,classCode,grade))
connection.commit()
finally:
# connection.close()
pass
return
response = urllib.request.urlopen('http://172.16.129.117/web_jxrw/cx_kb_bjxzall.aspx')
if response.code == 200:
print ("succeed",response.code)
html = response.read()
soup = BeautifulSoup(
html,
'html.parser',
from_encoding = 'gb2312')
arr = soup.find(id="Cxbj_all1_bjlist1").find_all("option")
for x in arr:
ch = x.get_text()
classCode = x['value']
last = x['value'][0]+x['value'][1]
if last == "98":
grade = "19"+last
pass
else:
grade = "20"+last
print(classCode)
sqlLink(str(ch),classCode,grade)
else:
print ("failed")

importurllib
importurllib.request
frombs4importBeautifulSoup
importpymysql.cursors
connection=pymysql.connect(
host='localhost',
user='root',
password='',
db='python',
charset="utf8",
cursorclass=pymysql.cursors.DictCursor)
defsqlLink(clssName,classCode,grade):
try:
withconnection.cursor()ascursor:
sql="INSERT INTO `class` (`className`,`classCode`,`grade`) VALUES (%s,%s,%s)"
cursor.execute(sql,(clssName,classCode,grade))
connection.commit()
finally:
# connection.close()
pass
return
response=urllib.request.urlopen('http://172.16.129.117/web_jxrw/cx_kb_bjxzall.aspx')
ifresponse.code==200:
print("succeed",response.code)
html=response.read()
soup=BeautifulSoup(
html,
'html.parser',
from_encoding='gb2312')
arr=soup.find(id="Cxbj_all1_bjlist1").find_all("option")
forxinarr:
ch=x.get_text()
classCode=x['value']
last=x['value'][0]+x['value'][1]
iflast=="98":
grade="19"+last
pass
else:
grade="20"+last
print(classCode)
sqlLink(str(ch),classCode,grade)
else:
print("failed")

抓取所有课程表

有了班级列表和学期这些数据，我们就可以遍历班级列表和学期列表来拼接成课程表查询链接，提取每个链接中的课程。

同理，在id为GVkb的table标签下，每天的课程都会放在td标签下。

Python代码为soup.find(id="GVkb").find_all("td")

抓取所有课程表python代码

此处分为两个python文件, 请按照注释命名文件，第一个是spider.py，第二个是spider_classTable.py，要放在同一目录下运行。

第一个文件spider.py

Python
import pymysql.cursors
import spider_classTable
connection = pymysql.connect(
host='localhost',
user='root',
password='',
db='python',
charset="utf8",
cursorclass=pymysql.cursors.DictCursor)
try:
### get all semester
with connection.cursor() as cursor:
sql = "SELECT * FROM `semester`"
cursor.execute(sql)
resultSemester = cursor.fetchall()
# for x in resultSemester:
# print(x["semester"])
# pass
### get all classCode
with connection.cursor() as cursor:
sql = "SELECT `classCode` FROM `class`"
cursor.execute(sql)
resultClassCode = cursor.fetchall()
# for x in resultClassCode:
# print(x["classCode"])
# pass
finally:
# connection.close()
i = 0
pass
for x in resultClassCode:
classCode = x["classCode"]
# print("classCode = ",classCode)
for y in resultSemester:
semester = y["semester"]
# print("Semester = ",semester)
gradeClassCode = int(classCode[0]+classCode[1])
gradeSemester = int(semester[0]+semester[1])
try:
with connection.cursor() as cursor:
sql = "SELECT `classCode`,`semester` FROM `timetable` WHERE classCode='"+classCode+"' AND semester='"+semester+"'"
cursor.execute(sql)
resultRepeat = cursor.fetchone()
finally:
# connection.close()
pass
if (gradeSemester >= gradeClassCode) and (gradeSemester <= gradeClassCode+3):
if resultRepeat is None:
arr = [classCode,semester]
print(arr)
spider_classTable.insertTameTable(classCode,semester)
else:
print("The timetable had already exist")
i += 1
print(i,"Processing...")

importpymysql.cursors
importspider_classTable
connection=pymysql.connect(
host='localhost',
user='root',
password='',
db='python',
charset="utf8",
cursorclass=pymysql.cursors.DictCursor)
try:
### get all semester
withconnection.cursor()ascursor:
sql="SELECT * FROM `semester`"
cursor.execute(sql)
resultSemester=cursor.fetchall()
# for x in resultSemester:
#     print(x["semester"])
#     pass
### get all classCode
withconnection.cursor()ascursor:
sql="SELECT `classCode` FROM `class`"
cursor.execute(sql)
resultClassCode=cursor.fetchall()
# for x in resultClassCode:
#     print(x["classCode"])
# pass
finally:
# connection.close()
i=0
pass
forxinresultClassCode:
classCode=x["classCode"]
# print("classCode = ",classCode)
foryinresultSemester:
semester=y["semester"]
# print("Semester = ",semester)
gradeClassCode=int(classCode[0]+classCode[1])
gradeSemester=int(semester[0]+semester[1])
try:
withconnection.cursor()ascursor:
sql="SELECT `classCode`,`semester` FROM `timetable` WHERE classCode='"+classCode+"' AND semester='"+semester+"'"
cursor.execute(sql)
resultRepeat=cursor.fetchone()
finally:
# connection.close()
pass
if(gradeSemester>=gradeClassCode)and(gradeSemester<=gradeClassCode+3):
ifresultRepeatisNone:
arr=[classCode,semester]
print(arr)
spider_classTable.insertTameTable(classCode,semester)
else:
print("The timetable had already exist")
i+=1
print(i,"Processing...")
第二个文件spider_classTable.py
Python
import urllib
import urllib.request
from bs4 import BeautifulSoup
import pymysql.cursors
def openLink(response):
try:
urllib.urlopen(response)
return 0
except e:
return 1
pass
def insertTameTable(classCode,semester):
connection = pymysql.connect(
host='localhost',
user='root',
password='',
db='python',
charset="utf8",
cursorclass=pymysql.cursors.DictCursor)
# response = urllib.request.urlopen('http://172.16.129.117/web_jxrw/cx_kb_bjkb_bj.aspx?xsbh='+classCode+'&xq='+semester)
response = urllib.request.Request('http://172.16.129.117/web_jxrw/cx_kb_bjkb_bj.aspx?xsbh='+classCode+'&xq='+semester)
flag = 1
while (flag == 1):
try:
print("opening link...")
response = urllib.request.urlopen(response)
flag = 0
except:
print("failed open the link. retrying...")
flag = 1
pass
if response.code == 200:
print ("succeed to open the url, code=",response.code)
html = response.read()
soup = BeautifulSoup(
html,
'html.parser',
from_encoding = 'gbk')
arr = soup.find(id="GVkb").find_all("td")
note = soup.find(id="TextBox1")
timeTable = []
for x in arr:
ch = x.get_text()
strings = str(ch).replace("\xa0","none")+"\n"
timeTable.append(strings)
if len(timeTable) == 8:
try:
with connection.cursor() as cursor:
sql = "INSERT INTO `timetable` (`classCode`,`semester`,`section`,`one`,`two`,`three`,`four`,`five`,`six`,`seven`) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)"
cursor.execute(sql, (classCode,semester,timeTable[0],timeTable[1],timeTable[2],timeTable[3],timeTable[4],timeTable[5],timeTable[6],timeTable[7]))
connection.commit()
finally:
timeTable = []
print("insert succeed!")
pass
try:
with connection.cursor() as cursor:
sql = "INSERT INTO `timetablenote` (`classCode`,`semester`,`note`) VALUES (%s,%s,%s)"
cursor.execute(sql, (classCode,semester,note.get_text()))
connection.commit()
finally:
print("note insert succeed!")
pass
else:
print("failed to open the url, code=", response.code)
pass

importurllib
importurllib.request
frombs4importBeautifulSoup
importpymysql.cursors
defopenLink(response):
try:
urllib.urlopen(response)
return0
excepte:
return1
pass
definsertTameTable(classCode,semester):
connection=pymysql.connect(
host='localhost',
user='root',
password='',
db='python',
charset="utf8",
cursorclass=pymysql.cursors.DictCursor)
# response = urllib.request.urlopen('http://172.16.129.117/web_jxrw/cx_kb_bjkb_bj.aspx?xsbh='+classCode+'&xq='+semester)
response=urllib.request.Request('http://172.16.129.117/web_jxrw/cx_kb_bjkb_bj.aspx?xsbh='+classCode+'&xq='+semester)
flag=1
while(flag==1):
try:
print("opening link...")
response=urllib.request.urlopen(response)
flag=0
except:
print("failed open the link. retrying...")
flag=1
pass
ifresponse.code==200:
print("succeed to open the url, code=",response.code)
html=response.read()
soup=BeautifulSoup(
html,
'html.parser',
from_encoding='gbk')
arr=soup.find(id="GVkb").find_all("td")
note=soup.find(id="TextBox1")
timeTable=[]
forxinarr:
ch=x.get_text()
strings=str(ch).replace("\xa0","none")+"\n"
timeTable.append(strings)
iflen(timeTable)==8:
try:
withconnection.cursor()ascursor:
sql="INSERT INTO `timetable` (`classCode`,`semester`,`section`,`one`,`two`,`three`,`four`,`five`,`six`,`seven`) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)"
cursor.execute(sql,(classCode,semester,timeTable[0],timeTable[1],timeTable[2],timeTable[3],timeTable[4],timeTable[5],timeTable[6],timeTable[7]))
connection.commit()
finally:
timeTable=[]
print("insert succeed!")
pass
try:
withconnection.cursor()ascursor:
sql="INSERT INTO `timetablenote` (`classCode`,`semester`,`note`) VALUES (%s,%s,%s)"
cursor.execute(sql,(classCode,semester,note.get_text()))
connection.commit()
finally:
print("note insert succeed!")
pass
else:
print("failed to open the url, code=",response.code)
pass

运行抓取课程表python代码

等待抓取。我大概花了6.5个小时才把所有的课程表抓取下来。建议在晚上准备睡觉的时候运行，直到第二天早上。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。