今天南京的中学正式开学了,在教师节来临的前一天,看来国家还是为孩子们创造了为老师先祝福的机会呀。
今天我们就来分享一个web小程序,满满的夸夸弹幕献给可爱的育人园丁!
Flask 程序框架
我们还是使用 Flask 作为基本的 web 框架,仅仅需要5行左右的代码,就能完成
from flask import Flask
from flask import render_template
@app.route('/')
def index():
return render_template("index2.html")
if __name__ == '__main__':
app.run(debug=True)
我们再编写一个带滚动字幕的 HTML 文件,对于滚动字幕,一般都是使用标签 marquee 来实现
我们先输入一些固定的词语,来看下基本效果
<div class="content", id="datatext">
<marquee behavior="scroll">开学啦</marquee>
<marquee behavior="alternate">教师节快乐!</marquee>
<marquee direction="up">老师</marquee>
<marquee direction="down">辛苦了</marquee>
<marquee behavior="scroll">幸福不,哦no!</marquee>
</div>
然后再增加一些 CSS 效果,基本的 web 页面就完成了
<style> marquee {
font-weight: bolder;
font-size: 40px;
color: white;
}
.content {
margin: 100px auto;
width: 500px;
height: 300px;
background: url("https://imgconvert.csdnimg.cn/aHR0cHM6Ly91cGxvYWQtaW1hZ2VzLmppYW5zaHUuaW8vdXBsb2FkX2ltYWdlcy8yMDE5MDY0MS0zYWE5ZDExOWU3ZTVmODhhLmpwZw?x-oss-process=image/format,png");
border-radius: 24px;
position: relative;
}
...</style>
我们运行程序,打开页面来看看
好了,下面我们开始获取夸夸的数据,
获取数据
在前面的文章中,我们已经全面的分析过知乎话题的爬取过程了,这里就不再过多赘述,直接上代码
抓取并保存的代码
import requests
import re
import os
import time
def get_zhihu():
zhihu_url = "https://www.zhihu.com/api/v4/questions/485491358/answers?include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cattachment%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Crelevant_info%2Cquestion%2Cexcerpt%2Cis_labeled%2Cpaid_info%2Cpaid_info_content%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cis_recognized%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cvip_info%2Cbadge%5B%2A%5D.topics%3Bdata%5B%2A%5D.settings.table_of_content.enabled&limit=20&offset=5&platform=desktop&sort_by=default"
zhihu_header = {
"User-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36"}
res = requests.get(zhihu_url, headers=zhihu_header)
return res.json()
def filter_str(desstr,restr=''):
#过滤除中文以外的其他字符
res = re.compile("[^\u4e00-\u9fa5^,^,^.^。^【^】^(^)^(^)^“^”^-^!^!^?^?^]")
return res.sub(restr, desstr)
def change_comma(datastr):
datastr = datastr.replace(',', ',')
return datastr
def change_time(time_str):
timeStamp = time_str
timeArray = time.localtime(timeStamp)
otherStyleTime = time.strftime("%Y-%m-%d %H:%M:%S", timeArray)
return otherStyleTime
def save_answers(data):
if not os.path.exists(r'teacher_data.csv'):
with open(r"teacher_data.csv", "a+", encoding='utf-8') as f:
f.write("用户,回答内容,创建时间,点赞数量,评论数量\n")
for i in data["data"]:
user = i["author"]["name"]
content = change_comma(filter_str(i["content"]))
created_time = change_time(i["created_time"])
voteup_count = i["voteup_count"]
comment_count = i["comment_count"]
row = '{},{},{},{},{}'.format(user,content,created_time,voteup_count,comment_count)
f.write(row)
f.write('\n')
else:
with open(r"teacher_data.csv", "a+", encoding='utf-8') as f:
for i in data["data"]:
user = i["author"]["name"]
content = change_comma(filter_str(i["content"]))
created_time = change_time(i["created_time"])
voteup_count = i["voteup_count"]
comment_count = i["comment_count"]
row = '{},{},{},{},{}'.format(user,content,created_time,voteup_count,comment_count)
f.write(row)
f.write('\n')
if __name__ == '__main__':
zhihu_data = get_zhihu()
save_answers(zhihu_data)
进行分词的代码
import jieba
import pandas as pd
font = r'C:\Windows\Fonts\FZSTK.TTF'
STOPWORDS = {"回复", "@", "我", "她", "你", "他", "了", "的", "吧", "吗", "在", "啊", "不", "也", "还", "是",
"说", "都", "就", "没", "做", "人", "赵薇", "被", "不是", "现在", "什么", "这", "呢", "知道", "邓", "我们", "他们", "和", "有", "", "",
"要", "就是", "但是", "而", "为", "自己", "中", "问题", "一个", "没有", "到", "这个", "并", "对", "[", "]", "“", "”", ",", "。"}
def gen_words(file):
df = pd.read_csv(file, usecols=[1])
df_copy = df.copy()
df_copy['comment'] = df_copy['回答内容'].apply(lambda x: str(x).split()) # 去掉空格
df_list = df_copy.values.tolist()
comment = jieba.cut(str(df_list), cut_all=False)
outstr = ""
for word in comment:
if word not in STOPWORDS:
if word != '\t':
outstr += word
outstr += " "
return outstr
if __name__ == '__main__':
a = gen_words("teacher_data.csv")
print(a)
至此,知乎数据的爬取完成
程序完善
最后我们来完成程序,首先编写一个前端获取数据的视图函数
@app.route('/data')
def getdata():
teacher_data = gen_word.gen_words(r"C:\Python_project\teacher_day\teacher_data.csv")
res = {}
data_list = teacher_data.split(" ")
index_num = 0
data_l = []
for i in data_list:
data_l.append(i)
index_num += 1
random_data_l = random.sample(data_l, 5)
res['data'] = random_data_l
return Response(json.dumps(res))
从我们保存的知乎数据中获取相关的分词信息,然后随机拿出5个词语,返回给前端
在前端代码里,我们使用原始的 AJAX 来进行接口调用
<script type="text/javascript"> function getdata(){
$.ajax({
type: 'GET',
url: "http://127.0.0.1:5000/data",
dataType: 'json',
success: function(data){
var text = "";
var flag = 1;
for (var i=0;i<data['data'].length ;i++ )
{
if (flag ==1)
{text = text+'<marquee behavior="scroll" direction="left" scrollamount="30"><font color="red" size="15px" >'+data['data'][i]+'</font> </marquee>';
}
flag = flag +1;
if (flag ==5)
{
flag =1;
}
}
document.getElementById("datatext").innerHTML=text;
}
});
}
setInterval("getdata()","5000");
</script>
基本代码含义就是从后台拿到数据之后,根据数据的位置,进行滚动速度,字幕颜色等设置,然后复写到datatext当中,5秒钟更新一次
我们来看下最终的效果吧
好了,今天的分享就到这里
本文部分代码参考:https://gitee.com/lyc96/hot-search-running-lantern/blob/master/templates/view.html