目录

  • 前言
  • 编写代码
  • 总结

前言

还是上一篇文章的网站,但本次抓取的是股票的现金流量表,数据更多了一些,所以代码还是有点问题,抓取少量股票时存入excel里的数据格式是没问题的,但数据量一多,中间有被反爬了,那么表格后面的数据就会出现格式问题,因为我还比较菜,等以后再回过头来解决这个问题吧。
我们来看看结果。
股票现金流量表的爬取(有局限性)_数据

编写代码

1.分析网页

股票id的获取我就不说了,可以去看上一篇文章股票数据的爬取(selenium的使用)​ 现金流量表是在股票页面的公司概况里,点进去后然后点击财务分析再点击现金流量表。
股票现金流量表的爬取(有局限性)_json_02
——————————————————————————————————
股票现金流量表的爬取(有局限性)_数据_03
然后又是老套路,右键点击检查,进入network,然后勾选xhr,因为一般这种异步加载的数据都在xhr里,如果这里找不到我们再去勾选all一个一个看。
这个网页一刷新出现的这些包里有我们需要的数据。data里有5条数据,刚好对应上面的5个年份,但不要忘记这个现金流量表不止这几个年份,点击棕色小框的箭头。然后发现xhr里又多出来一个,看看它们的链接之间的规律。
股票现金流量表的爬取(有局限性)_数据_04

http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2021-03-31%2C2020-12-31%2C2020-06-30%2C2020-03-31%2C2019-12-31&code=SZ300986
http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2019-06-30%2C2018-12-31%2C2017-12-31%2C2017-06-30%2C2016-12-31&code=SZ300986
http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2016-12-31%2C2016-06-30%2C2015-12-31%2C2015-06-30%2C2014-12-31&code=SZ300986

然后发现就是dates这个参数的值变化了,然后code是股票的板块加id。这样的话dates我们可以不用去修改,弄一个列表来存放就好了。但要注意的是有些股票的年份很少,只到2015年,而有些股票却可以到1999年,所以这个url要找全。
接着我们来看一下这个数据,嗯,又是和上篇文章中的股票行情一样的情况,无法对应到表格里。还是老办法,ctrl+shift+F进行全局搜索,只有一个文件包含这个字段,点击进去一看,我们找到我们需要的对应关系了。其他的字段对应关系也在这附近。
股票现金流量表的爬取(有局限性)_数据_05
股票现金流量表的爬取(有局限性)_ajax_06

2.编写代码

首先是获取到股票的id。

def get_ids_names(n):
'''
:n: 板块序号
:data: 存放id和name的字典的列表,f12:id, f14:name
:num: 股票的数量
:return {num: data}
:urls中pz为一次性获取的数量,超过最大值只返回最大数量
'''
urls = [
'http://81.push2.eastmoney.com/api/qt/clist/get?pn=1&pz=2000&po=1&np=1&fid=f3&fs=m:1+t:2,m:1+t:23&fields=f12,f14', #上证A股2000
'http://81.push2.eastmoney.com/api/qt/clist/get?pn=1&pz=3000&po=1&np=1&fid=f3&fs=m:0+t:6,m:0+t:80&fields=f12,f14', #深证A股3000
'http://81.push2.eastmoney.com/api/qt/clist/get?pn=1&pz=1000&po=1&np=1&fid=f3&fs=m:0+t:80&fields=f12,f14', #创业板1000
'http://81.push2.eastmoney.com/api/qt/clist/get?pn=1&pz=300&po=1&np=1&fid=f3&fs=m:1+t:23&fields=f12,f14', #科创版300
]
url = urls[n]
d = response(url,0.5).json()['data']
num = d['total']
data = d['diff']
print(num,len(data))
return {num: data}

获取到股票id后,传入到处理股票id的函数里。

def chuli_ids_names(ids_names, n):
id_name_queue = Queue()
for id_name in ids_names:
id_name_queue.put((id_name,n))
th1 = MyThread(id_name_queue)
th1.start()
th1.join()
return th1.y

这里因为股票数量比较多,我就使用的多线程。

class MyThread(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
self.y = []
def run(self):
while not self.queue.empty(): #当队列不为空
data = self.queue.get()
g_datas = get_gdata(data[0], data[1])
new_gdatas = []
for da in g_datas:
d = list(da.values())[0]
# print(d)
new_gdatas.append({d['REPORT_DATE']: jianxi(d)})
self.y.append({data[0]['f12']+data[0]['f14']: new_gdatas})
# print(self.y)
print(len(self.y))

这里面的get_gdata函数是获取一只股票的现金流量表的函数。

def get_gdata(id_name,n):
urls = [
'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2021-03-31%2C2020-12-31%2C2020-09-30%2C2020-06-30%2C2020-03-31&code=S',
'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2020-03-31%2C2019-12-31%2C2019-09-30%2C2018-12-31%2C2017-12-31&code=S',
'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2018-03-31%2C2017-12-31%2C2017-09-30%2C2017-06-30%2C2017-03-31&code=S',
'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2016-12-31%2C2016-09-30%2C2016-06-30%2C2016-03-31%2C2015-12-31&code=S',
'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2015-09-30%2C2015-06-30%2C2015-03-31%2C2014-12-31%2C2014-09-30&code=S',
'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2014-06-30%2C2014-03-31%2C2013-12-31%2C2013-09-30%2C2013-06-30&code=S',
'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2013-03-31%2C2012-12-31%2C2012-09-30%2C2012-06-30%2C2012-03-31&code=S',
'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2011-12-31%2C2011-09-30%2C2011-06-30%2C2011-03-31%2C2010-12-31&code=S',
'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2010-12-31%2C2010-06-30%2C2009-12-31%2C2008-12-31%2C2007-12-31&code=S'

]
id = id_name['f12']
d = c_id(id, n)
# print(d,id)
g_datas = [] # 一只股票的数据
datas = []
data_time = []
t = 2
for u in urls:
url = u + d
res = response(url, 0)
if 'data' in res.json().keys():
for i in range(int(res.json()['count'])):
if res.json()['data'][i]['REPORT_DATE'] not in data_time:
data_time.append(res.json()['data'][i]['REPORT_DATE'])
datas.append(res.json()['data'][i])
else:
if t == 0:
break
t -= 1
print('共有', len(datas), '列')
for data in datas:
g_datas.append({data['REPORT_DATE']: data})
return g_datas

获取到一只股票的现金流量表后将其按{股票id+股票名称:现金流量表}这样的格式添加到self.y里。一个板块所有的股票数据获取完毕后将数据写入excel表里。

def write_b_data(n,b_data):
'''
:b_datas:一个板块所有股票的现金流量表,存放格式为[{id+name: g_data},...,{id+name: g_data}]
:g_data:一只股票的现金流量表,存放格式为[{time:data},...,{time:data}]
:data:一个时间的现金流量,存放格式为{key:value,...,key:value}
'''
name = ['上证A股','深证A股','创业版','科创版']
wb = load_workbook('xjls.xlsx')
ws = wb.create_sheet(name[n], 0)
bt = { # 表格第一行
'A': '现金流量表',
'B': '2021-03-31 00:00:00',
'C': '2020-12-31 00:00:00',
'D': '2020-09-30 00:00:00',
'E': '2020-06-30 00:00:00',
'F': '2020-03-31 00:00:00',
'G': '2019-12-31 00:00:00',
'H': '2019-09-30 00:00:00',
'I': '2019-06-30 00:00:00',
'J': '2019-03-31 00:00:00',
'K': '2018-12-31 00:00:00',
'L': '2018-09-30 00:00:00',
'M': '2018-06-30 00:00:00',
'N': '2018-03-31 00:00:00',
'O': '2017-12-31 00:00:00',
'P': '2017-09-30 00:00:00',
'Q': '2017-06-30 00:00:00',
'R': '2017-03-31 00:00:00',
'S': '2016-12-31 00:00:00',
'T': '2016-09-30 00:00:00',
'U': '2016-06-30 00:00:00',
'V': '2016-03-31 00:00:00',
'W': '2015-12-31 00:00:00',
'X': '2015-09-30 00:00:00',
'Y': '2015-06-30 00:00:00',
'Z': '2015-03-31 00:00:00',
'AA': '2014-12-31 00:00:00',
'AB': '2014-09-30 00:00:00',
'AC': '2014-06-30 00:00:00',
'AD': '2014-03-31 00:00:00',
'AE': '2013-12-31 00:00:00',
'AF': '2013-09-30 00:00:00',
'AG': '2013-06-30 00:00:00',
'AH': '2013-03-31 00:00:00',
'AI': '2012-12-31 00:00:00',
'AJ': '2012-09-30 00:00:00',
'AK': '2012-06-30 00:00:00',
'AL': '2012-03-31 00:00:00',
'AM': '2011-12-31 00:00:00',
'AN': '2011-09-30 00:00:00',
'AO': '2011-06-30 00:00:00',
'AP': '2011-03-31 00:00:00',
'AQ': '2010-12-31 00:00:00',
'AR': '2010-09-30 00:00:00',
'AS': '2010-06-30 00:00:00',
'AT': '2010-03-31 00:00:00',
'AU': '2009-12-31 00:00:00',
'AV': '2009-09-30 00:00:00',
'AW': '2009-06-30 00:00:00',
'AX': '2009-03-31 00:00:00',
'AY': '2008-12-31 00:00:00',
'AZ': '2008-09-30 00:00:00',
'BA': '2008-06-30 00:00:00',
'BB': '2008-03-31 00:00:00',
'BC': '2007-12-31 00:00:00',
'BD': '2007-09-30 00:00:00',
'BE': '2007-06-30 00:00:00',
'BF': '2007-03-31 00:00:00',
'BG': '2006-12-31 00:00:00',
'BH': '2006-09-30 00:00:00',
'BI': '2006-06-30 00:00:00',
'BJ': '2006-03-31 00:00:00',
'BK': '2005-12-31 00:00:00',
'BL': '2005-09-30 00:00:00',
'BM': '2005-06-30 00:00:00',
'BN': '2005-03-31 00:00:00',
'BO': '2004-12-31 00:00:00',
'BP': '2004-09-30 00:00:00',
'BQ': '2004-06-30 00:00:00',
'BR': '2004-03-31 00:00:00',
'BS': '2003-12-31 00:00:00',
'BT': '2003-09-30 00:00:00',
'BU': '2003-06-30 00:00:00',
'BV': '2003-03-31 00:00:00',
'BW': '2002-12-31 00:00:00',
'BX': '2002-09-30 00:00:00',
'BY': '2002-06-30 00:00:00',
'BZ': '2002-03-31 00:00:00',
'CA': '2001-12-31 00:00:00',
'CB': '2001-09-30 00:00:00',
'CC': '2001-06-30 00:00:00',
'CD': '2001-03-31 00:00:00',
'CE': '2000-12-31 00:00:00',
'CF': '2000-09-30 00:00:00',
'CG': '2000-06-30 00:00:00',
'CH': '2000-03-31 00:00:00',
'CI': '1999-12-31 00:00:00',
'CJ': '1999-09-30 00:00:00',
'CK': '1999-06-30 00:00:00',
'CL': '1999-03-31 00:00:00',
'CM': '1998-12-31 00:00:00',
'CN': '1998-09-30 00:00:00',
'CO': '1998-06-30 00:00:00',
'CP': '1998-03-31 00:00:00',
'CQ': '1997-12-31 00:00:00',
'CR': '1997-09-30 00:00:00',
'CS': '1997-06-30 00:00:00',
'CT': '1997-03-31 00:00:00',
'CU': '1996-12-31 00:00:00',
'CV': '1996-09-30 00:00:00',
'CW': '1996-06-30 00:00:00',
'CX': '1996-03-31 00:00:00',
}
for key in list(bt.keys()): #存入表头
ws[key + '1'] = bt[key]
i = 2
g = 0
for g_data in b_data:
ws['A' + str(i)] = list(g_data.keys())[0] #存入股票id名称
i += 1
t = 0
for data in list(g_data.values())[0]:
for key in list(data.keys()):
b = 3 + g * 116
value1 = data[key]
if t == 0:
for k_values in list(value1.keys()): # 存入第一列名称
ws['A' + str(i)] = k_values
i += 1
t = 1
xh = get_v_key(bt, key)
for v in value1.values(): #存入股票对应时间的现金流量表
if v == None:
v = '----'
ws[xh + str(b)] = v
# print(xh + str(b), v)
b += 1
g += 1
print(g)
wb.save('xjls.xlsx')
wb.close()

3.总的代码

import requests
import time
from openpyxl import Workbook
from openpyxl import load_workbook
import threading
from queue import Queue

# 数据量少可以使用,数据量一多就出问题

class MyThread(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
self.y = []
def run(self):
while not self.queue.empty(): #当队列不为空
data = self.queue.get()
g_datas = get_gdata(data[0], data[1])
new_gdatas = []
for da in g_datas:
d = list(da.values())[0]
# print(d)
new_gdatas.append({d['REPORT_DATE']: jianxi(d)})
self.y.append({data[0]['f12']+data[0]['f14']: new_gdatas})
# print(self.y)
print(len(self.y))

def get_gdata(id_name,n):
urls = [
'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2021-03-31%2C2020-12-31%2C2020-09-30%2C2020-06-30%2C2020-03-31&code=S',
'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2020-03-31%2C2019-12-31%2C2019-09-30%2C2018-12-31%2C2017-12-31&code=S',
'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2018-03-31%2C2017-12-31%2C2017-09-30%2C2017-06-30%2C2017-03-31&code=S',
'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2016-12-31%2C2016-09-30%2C2016-06-30%2C2016-03-31%2C2015-12-31&code=S',
'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2015-09-30%2C2015-06-30%2C2015-03-31%2C2014-12-31%2C2014-09-30&code=S',
'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2014-06-30%2C2014-03-31%2C2013-12-31%2C2013-09-30%2C2013-06-30&code=S',
'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2013-03-31%2C2012-12-31%2C2012-09-30%2C2012-06-30%2C2012-03-31&code=S',
'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2011-12-31%2C2011-09-30%2C2011-06-30%2C2011-03-31%2C2010-12-31&code=S',
'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2010-12-31%2C2010-06-30%2C2009-12-31%2C2008-12-31%2C2007-12-31&code=S'

]
id = id_name['f12']
d = c_id(id, n)
# print(d,id)
g_datas = [] # 一只股票的数据
datas = []
data_time = []
t = 2
for u in urls:
url = u + d
res = response(url, 0)
if 'data' in res.json().keys():
for i in range(int(res.json()['count'])):
if res.json()['data'][i]['REPORT_DATE'] not in data_time:
data_time.append(res.json()['data'][i]['REPORT_DATE'])
datas.append(res.json()['data'][i])
else:
if t == 0:
break
t -= 1
print('共有', len(datas), '列')
for data in datas:
g_datas.append({data['REPORT_DATE']: data})
return g_datas

def response(url,sleep_time):
times = 3
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36 Edg/89.0.774.76',
}
try:
res = requests.get(url,headers=headers)
if res.status_code >=200 and res.status_code<300:
time.sleep(sleep_time)
return res
else:
return None
except Exception as e:
print(e)
if times>0:
print('机会次数:'+str(times))
response(url,sleep_time)
else:
print('无法爬取')

def c_id(id, n):
if n == 0 or n == 3:
return ''.join(['H',id])
elif n == 1 or n == 2:
return ''.join(['Z',id])
else:
print('板块错误')

def get_ids_names(n):
'''
n: 板块序号
data: 存放id和name的字典的列表,f12:id, f14:name
num: 股票的数量
return {num: data}
urls中pz为一次性获取的数量,超过最大值只返回最大数量
'''
urls = [
'http://81.push2.eastmoney.com/api/qt/clist/get?pn=1&pz=2000&po=1&np=1&fid=f3&fs=m:1+t:2,m:1+t:23&fields=f12,f14', #上证A股2000
'http://81.push2.eastmoney.com/api/qt/clist/get?pn=1&pz=3000&po=1&np=1&fid=f3&fs=m:0+t:6,m:0+t:80&fields=f12,f14', #深证A股3000
'http://81.push2.eastmoney.com/api/qt/clist/get?pn=1&pz=1000&po=1&np=1&fid=f3&fs=m:0+t:80&fields=f12,f14', #创业板1000
'http://81.push2.eastmoney.com/api/qt/clist/get?pn=1&pz=300&po=1&np=1&fid=f3&fs=m:1+t:23&fields=f12,f14', #科创版300
]
url = urls[n]
d = response(url,0.5).json()['data']
num = d['total']
data = d['diff']
print(num,len(data))
return {num: data}

def jianxi(d):
ys = {
'经营活动产生的现金流量': '',
'\t销售商品、提供劳务收到的现金': 'SALES_SERVICES',
'\t客户存款和同业存放款项净增加额': 'DEPOSIT_INTERBANK_ADD',
'\t向中央银行借款净增加额': 'LOAN_PBC_ADD',
'\t向其他金融机构拆入资金净增加额': 'OFI_BF_ADD',
'\t收到原保险合同保费取得的现金': 'RECEIVE_ORIGIC_PREMIUM',
'\t收到再保险业务现金净额': 'RECEIVE_REINSURE_NET',
'\t保户储金及投资款净增加额': 'INSURED_INVEST_ADD',
'\t处置交易性金融资产净增加额': 'DISPOSAL_TFA_ADD',
'\t收取利息、手续费及佣金的现金': 'RECEIVE_INTEREST_COMMISSION',
'\t拆入资金净增加额': 'BORROW_FUND_ADD',
'\t发放贷款及垫款的净减少额': 'LOAN_ADVANCE_REDUCE',
'\t回购业务资金净增加额': 'REPO_BUSINESS_ADD',
'\t收到的税收返还': 'RECEIVE_TAX_REFUND',
'\t收到其他与经营活动有关的现金': 'RECEIVE_OTHER_OPERATE',
'\t经营活动现金流入的其他项目': 'OPERATE_INFLOW_OTHER',
'\t经营活动现金流入的平衡项目': 'OPERATE_INFLOW_BALANCE',
'经营活动现金流入小计': 'TOTAL_OPERATE_INFLOW',
'\t购买商品、接受劳务支付的现金': 'BUY_SERVICES',
'\t客户贷款及垫款净增加额': 'LOAN_ADVANCE_ADD',
'\t存放中央银行和同业款项净增加额': 'PBC_INTERBANK_ADD',
'\t支付原保险合同赔付等款项的现金': 'PAY_ORIGIC_COMPENSATE',
'\t支付利息、手续费及佣金的现金': 'PAY_INTEREST_COMMISSION',
'\t支付保单红利的现金': 'PAY_POLICY_BONUS',
'\t支付给职工以及为职工支付的现金': 'PAY_STAFF_CASH',
'\t支付的各项税费': 'PAY_ALL_TAX',
'\t支付其他与经营活动有关的现金': 'PAY_OTHER_OPERATE',
'\t经营活动现金流出的其他项目': 'OPERATE_OUTFLOW_OTHER',
'\t经营活动现金流出的平衡项目': 'OPERATE_OUTFLOW_BALANCE',
'经营活动现金流出小计': 'TOTAL_OPERATE_OUTFLOW',
'\t经营活动产生的现金流量净额其他项目': 'OPERATE_NETCASH_OTHER',
'\t经营活动产生的现金流量净额平衡项目': 'OPERATE_NETCASH_BALANCE',
'经营活动产生的现金流量净额': 'OPERATE_NETCASH_OTHER',
'投资活动产生的现金流量': '',
'\t收回投资收到的现金': 'WITHDRAW_INVEST',
'\t取得投资收益收到的现金': 'RECEIVE_INVEST_INCOME_QOQ',
'\t处置固定资产、无形资产和其他长期资产收回的现金净额': 'DISPOSAL_LONG_ASSET',
'\t处置子公司及其他营业单位收到的现金': 'DISPOSAL_SUBSIDIARY_OTHER',
'\t减少质押和定期存款所收到的现金': 'REDUCE_PLEDGE_TIMEDEPOSITS',
'\t收到的其他与投资活动有关的现金': 'RECEIVE_OTHER_INVEST',
'\t投资活动现金流入的其他项目': 'INVEST_INFLOW_OTHER',
'\t投资活动现金流入的平衡项目': 'INVEST_INFLOW_BALANCE',
'投资活动现金流入小计': 'TOTAL_INVEST_INFLOW',
'\t购建固定资产、无形资产和其他长期资产支付的现金': 'CONSTRUCT_LONG_ASSET',
'\t投资支付的现金': 'INVEST_PAY_CASH',
'\t质押贷款净增加额': 'PLEDGE_LOAN_ADD',
'\t取得子公司及其他营业单位支付的现金净额': 'OBTAIN_SUBSIDIARY_OTHER',
'\t增加质押和定期存款所支付的现金': 'ADD_PLEDGE_TIMEDEPOSITS',
'\t支付其他与投资活动有关的现金': 'PAY_OTHER_INVEST',
'\t投资活动现金流出的其他项目': 'INVEST_OUTFLOW_OTHER',
'\t投资活动现金流出的平衡项目': 'INVEST_OUTFLOW_BALANCE',
'投资活动现金流出小计': 'TOTAL_INVEST_OUTFLOW',
'\t投资活动产生的现金流量净额其他项目': 'INVEST_NETCASH_OTHER',
'\t投资活动产生的现金流量净额平衡项目': 'INVEST_NETCASH_BALANCE',
'投资活动产生的现金流量净额': 'INVEST_NETCASH_OTHER',
'筹资活动产生的现金流量': '',
'\t吸收投资收到的现金': 'ACCEPT_INVEST_CASH',
'\t其中:子公司吸收少数股东投资收到的现金': 'SUBSIDIARY_ACCEPT_INVEST',
'\t取得借款收到的现金': 'RECEIVE_LOAN_CASH',
'\t发行债券收到的现金': 'ISSUE_BOND',
'\t收到的其他与筹资活动有关的现金': 'RECEIVE_OTHER_FINANCE',
'\t筹资活动现金流入的其他项目': 'FINANCE_INFLOW_OTHER',
'\t筹资活动现金流入的平衡项目': 'FINANCE_INFLOW_BALANCE',
'筹资活动现金流入小计': 'TOTAL_FINANCE_INFLOW_QOQ',
'\t偿还债务所支付的现金': 'PAY_DEBT_CASH_QOQ',
'\t分配股利、利润或偿付利息支付的现金': 'ASSIGN_DIVIDEND_PORFIT_QOQ',
'\t其中:子公司支付给少数股东的股利、利润': 'SUBSIDIARY_PAY_DIVIDEND',
'\t购买子公司少数股权而支付的现金': 'BUY_SUBSIDIARY_EQUITY',
'\t支付的其他与筹资活动有关的现金': 'PAY_OTHER_FINANCE',
'\t其中:子公司减资支付给少数股东的现金': 'SUBSIDIARY_REDUCE_CASH',
'\t筹资活动现金流出的其他项目': 'FINANCE_OUTFLOW_OTHER',
'\t筹资活动现金流出的平衡项目': 'FINANCE_OUTFLOW_BALANCE',
'筹资活动现金流出小计': 'TOTAL_FINANCE_OUTFLOW',
'\t筹资活动产生的现金流量净额其他项目': 'FINANCE_NETCASH_OTHER',
'\t筹资活动产生的现金流量净额平衡项目': 'FINANCE_NETCASH_BALANCE',
'筹资活动产生的现金流量净额': 'FINANCE_NETCASH_OTHER',
'汇率变动对现金及现金等价物的影响': 'RATE_CHANGE_EFFECT',
'\t现金及现金等价物净增加额其他项目': 'CCE_ADD_OTHER',
'\t现金及现金等价物净增加额平衡项目': 'CCE_ADD_BALANCE',
'现金及现金等价物净增加额': 'CCE_ADD',
'\t加:期初现金及现金等价物余额': 'BEGIN_CCE',
'\t期末现金及现金等价物余额其他项目': 'END_CCE_OTHER',
'\t期末现金及现金等价物余额平衡项目': 'END_CCE_BALANCE',
'期末现金及现金等价物余额': 'END_CCE',
'补充资料': '',
'\t净利润': 'NETPROFIT',
'\t少数股东损益': 'MINORITY_INTEREST',
'\t资产减值准备': 'ASSET_IMPAIRMENT',
'\t固定资产和投资性房地产折旧': 'FA_IR_DEPR',
'\t其中:固定资产折旧、油气资产折耗、生产性生物资产折旧': 'OILGAS_BIOLOGY_DEPR',
'\t投资性房地产折旧': 'IR_DEPR',
'\t无形资产摊销': 'IA_AMORTIZE',
'\t长期待摊费用摊销': 'LPE_AMORTIZE',
'\t递延收益摊销': 'DEFER_INCOME_AMORTIZE',
'\t待摊费用的减少': 'PREPAID_EXPENSE_REDUCE',
'\t预提费用的增加': 'ACCRUED_EXPENSE_ADD',
'\t处置固定资产、无形资产和其他长期资产的损失': 'DISPOSAL_LONGASSET_LOSS',
'\t固定资产报废损失': 'FA_SCRAP_LOSS_QOQ',
'\t公允价值变动损失': 'FAIRVALUE_CHANGE_LOSS',
'\t财务费用': 'FINANCE_EXPENSE',
'\t投资损失': 'INVEST_LOSS',
'\t递延所得税': 'DEFER_TAX',
'\t其中:递延所得税资产减少': 'DT_ASSET_REDUCE',
'\t递延所得税负债增加': 'DT_LIAB_ADD',
'\t预计负债的增加': 'PREDICT_LIAB_ADD',
'\t存货的减少': 'INVENTORY_REDUCE',
'\t经营性应收项目的减少': 'OPERATE_RECE_REDUCE',
'\t经营性应付项目的增加': 'OPERATE_PAYABLE_ADD',
'\t其他': 'OTHER',
'\t经营活动产生的现金流量净额其他项目_': 'OPERATE_NETCASH_OTHERNOTE',
'\t经营活动产生的现金流量净额平衡项目_': 'OPERATE_NETCASH_BALANCENOTE',
'经营活动产生的现金流量净额_': 'NETCASH_OPERATENOTE',
'\t债务转为资本': 'DEBT_TRANSFER_CAPITAL',
'\t一年内到期的可转换公司债券': 'CONVERT_BOND_1YEAR',
'\t融资租入固定资产': 'FINLEASE_OBTAIN_FA',
'\t不涉及现金收支的投资和筹资活动金额其他项目': 'UNINVOLVE_INVESTFIN_OTHER',
'\t现金的期末余额': 'END_CASH',
'\t减:现金的期初余额': 'BEGIN_CASH',
'\t加:现金等价物的期末余额': 'END_CASH_EQUIVALENTS_QOQ',
'\t减:现金等价物的期初余额': 'BEGIN_CASH_EQUIVALENTS',
'\t现金及现金等价物净增加额其他项目_': 'CCE_ADD_OTHERNOTE',
'\t现金及现金等价物净增加额平衡项目_': 'CCE_ADD_BALANCENOTE',
'现金及现金等价物的净增加额': 'CCE_ADDNOTE',
'\t审计意见(境内)': 'OPINION_TYPE',
'\t审计意见(境外)': 'OSOPINION_TYPE'
}
new_data = {}
# print(d)
try:
for key, value in ys.items():
# print(key, value)
if value != '' and value in d.keys():
new_data[key] = d[value]
return new_data
except BaseException as b:
print(b)

def chuli_ids_names(ids_names, n, num):
id_name_queue = Queue()
for id_name in ids_names:
id_name_queue.put((id_name,n))
th1 = MyThread(id_name_queue)
th1.start()
th1.join()
return th1.y
# b_data = get_bdata(n, num, ids_names)
# write_b_data(i, b_data)

def get_v_key(dic,v):
'''
根据值取字典的键
'''
if v in dic.values():
return list(dic.keys())[list(dic.values()).index(v)]

def write_b_data(n,b_data):
'''
b_datas:一个板块所有股票的现金流量表,存放格式为[{id+name: g_data},...,{id+name: g_data}]
g_data:一只股票的现金流量表,存放格式为[{time:data},...,{time:data}]
data:一个时间的现金流量,存放格式为{key:value,...,key:value}
'''
name = ['上证A股','深证A股','创业版','科创版']
wb = load_workbook('xjls.xlsx')
ws = wb.create_sheet(name[n], 0)
bt = { # 表格第一行
'A': '现金流量表',
'B': '2021-03-31 00:00:00',
'C': '2020-12-31 00:00:00',
'D': '2020-09-30 00:00:00',
'E': '2020-06-30 00:00:00',
'F': '2020-03-31 00:00:00',
'G': '2019-12-31 00:00:00',
'H': '2019-09-30 00:00:00',
'I': '2019-06-30 00:00:00',
'J': '2019-03-31 00:00:00',
'K': '2018-12-31 00:00:00',
'L': '2018-09-30 00:00:00',
'M': '2018-06-30 00:00:00',
'N': '2018-03-31 00:00:00',
'O': '2017-12-31 00:00:00',
'P': '2017-09-30 00:00:00',
'Q': '2017-06-30 00:00:00',
'R': '2017-03-31 00:00:00',
'S': '2016-12-31 00:00:00',
'T': '2016-09-30 00:00:00',
'U': '2016-06-30 00:00:00',
'V': '2016-03-31 00:00:00',
'W': '2015-12-31 00:00:00',
'X': '2015-09-30 00:00:00',
'Y': '2015-06-30 00:00:00',
'Z': '2015-03-31 00:00:00',
'AA': '2014-12-31 00:00:00',
'AB': '2014-09-30 00:00:00',
'AC': '2014-06-30 00:00:00',
'AD': '2014-03-31 00:00:00',
'AE': '2013-12-31 00:00:00',
'AF': '2013-09-30 00:00:00',
'AG': '2013-06-30 00:00:00',
'AH': '2013-03-31 00:00:00',
'AI': '2012-12-31 00:00:00',
'AJ': '2012-09-30 00:00:00',
'AK': '2012-06-30 00:00:00',
'AL': '2012-03-31 00:00:00',
'AM': '2011-12-31 00:00:00',
'AN': '2011-09-30 00:00:00',
'AO': '2011-06-30 00:00:00',
'AP': '2011-03-31 00:00:00',
'AQ': '2010-12-31 00:00:00',
'AR': '2010-09-30 00:00:00',
'AS': '2010-06-30 00:00:00',
'AT': '2010-03-31 00:00:00',
'AU': '2009-12-31 00:00:00',
'AV': '2009-09-30 00:00:00',
'AW': '2009-06-30 00:00:00',
'AX': '2009-03-31 00:00:00',
'AY': '2008-12-31 00:00:00',
'AZ': '2008-09-30 00:00:00',
'BA': '2008-06-30 00:00:00',
'BB': '2008-03-31 00:00:00',
'BC': '2007-12-31 00:00:00',
'BD': '2007-09-30 00:00:00',
'BE': '2007-06-30 00:00:00',
'BF': '2007-03-31 00:00:00',
'BG': '2006-12-31 00:00:00',
'BH': '2006-09-30 00:00:00',
'BI': '2006-06-30 00:00:00',
'BJ': '2006-03-31 00:00:00',
'BK': '2005-12-31 00:00:00',
'BL': '2005-09-30 00:00:00',
'BM': '2005-06-30 00:00:00',
'BN': '2005-03-31 00:00:00',
'BO': '2004-12-31 00:00:00',
'BP': '2004-09-30 00:00:00',
'BQ': '2004-06-30 00:00:00',
'BR': '2004-03-31 00:00:00',
'BS': '2003-12-31 00:00:00',
'BT': '2003-09-30 00:00:00',
'BU': '2003-06-30 00:00:00',
'BV': '2003-03-31 00:00:00',
'BW': '2002-12-31 00:00:00',
'BX': '2002-09-30 00:00:00',
'BY': '2002-06-30 00:00:00',
'BZ': '2002-03-31 00:00:00',
'CA': '2001-12-31 00:00:00',
'CB': '2001-09-30 00:00:00',
'CC': '2001-06-30 00:00:00',
'CD': '2001-03-31 00:00:00',
'CE': '2000-12-31 00:00:00',
'CF': '2000-09-30 00:00:00',
'CG': '2000-06-30 00:00:00',
'CH': '2000-03-31 00:00:00',
'CI': '1999-12-31 00:00:00',
'CJ': '1999-09-30 00:00:00',
'CK': '1999-06-30 00:00:00',
'CL': '1999-03-31 00:00:00',
'CM': '1998-12-31 00:00:00',
'CN': '1998-09-30 00:00:00',
'CO': '1998-06-30 00:00:00',
'CP': '1998-03-31 00:00:00',
'CQ': '1997-12-31 00:00:00',
'CR': '1997-09-30 00:00:00',
'CS': '1997-06-30 00:00:00',
'CT': '1997-03-31 00:00:00',
'CU': '1996-12-31 00:00:00',
'CV': '1996-09-30 00:00:00',
'CW': '1996-06-30 00:00:00',
'CX': '1996-03-31 00:00:00',
}
for key in list(bt.keys()): #存入表头
ws[key + '1'] = bt[key]
i = 2
g = 0
for g_data in b_data:
ws['A' + str(i)] = list(g_data.keys())[0] #存入股票id名称
i += 1
t = 0
for data in list(g_data.values())[0]:
for key in list(data.keys()):
b = 3 + g * 116
value1 = data[key]
if t == 0:
for k_values in list(value1.keys()): # 存入第一列名称
ws['A' + str(i)] = k_values
i += 1
t = 1
xh = get_v_key(bt, key)
for v in value1.values(): #存入股票对应时间的现金流量表
if v == None:
v = '----'
ws[xh + str(b)] = v
# print(xh + str(b), v)
b += 1
g += 1
print(g)
wb.save('xjls.xlsx')
wb.close()

def main():
'''
上证A股 H+id
深证A股 Z+id
创业板 Z+id
科创版 H+id
'''
# wb = Workbook()
# wb.save('xjls.xlsx')
a = 0
num_ids_names = get_ids_names(a)
ids_names = list(num_ids_names.values())[0]
num = list(num_ids_names.keys())[0]
print(a, num, ids_names)
b_data = chuli_ids_names(ids_names, a, num)
write_b_data(a, b_data)
# for i in range(3):
# num_ids_names = get_ids_names(i)
# ids_names = list(num_ids_names.values())[0]
# num = list(num_ids_names.keys())[0]
# print(i, num, ids_names)
# b_data = chuli_ids_names(ids_names, i, num)
# write_b_data(i,b_data)

if __name__ == '__main__':
main()

总结

因为数据量太大的缘故,中途容易数据获取失败,然后导致数据写入格式错误。经测验,爬取科创版这个股票数量较少的板块数据是无误的。但像上证A股这种一个板块上千只股票就无法正确爬取,代码仍需要进一步优化。