比较忙,好久没有写东西了。今天没啥事刚好有个小需求
需求:后端部分数据从大数据平台抽取到数仓(以web分的8个模块几十张表吧)
1.之前直接用txt写的python datax.py XXX.json 发现效率贼特码低,能跑个十几二十分钟,转而用python直接写个多线程,为了方便直接吧各个模块的json放在各个模块了,反正表比较少,也懒得用线程池了。
直接8个线程同时起飞。
代码如下(非原代码)start.py
import os import threading credit = os.listdir(r'E:\datax\job\credit') manage = os.listdir(r'E:\datax\job\manage') zcgl = os.listdir(r'E:\datax\job\zcgl') huresources = os.listdir(r'E:\datax\job\huresources') retail = os.listdir(r'E:\datax\job\retail') industclient = os.listdir(r'E:\datax\job\industclient') institution = os.listdir(r'E:\datax\job\institution') investment = os.listdir(r'E:\datax\job\investment') def credit1(): for i in credit: os.system(r"python E:\datax\bin\datax.py E:\datax\job\credit\\" + i) def manage1(): for i in manage: os.system(r"python E:\datax\bin\datax.py E:\datax\job\manage\\" + i) def zcgl1(): for i in zcgl: os.system(r"python E:\datax\bin\datax.py E:\datax\job\zcgl\\" + i) def huresources1(): for i in huresources: os.system(r"python E:\datax\bin\datax.py E:\datax\job\huresources\\" + i) def retail1(): for i in retail: os.system(r"python E:\datax\bin\datax.py E:\datax\job\retail\\" + i) def industclient1(): for i in industclient: os.system(r"python E:\datax\bin\datax.py E:\datax\job\industclient\\" + i) def institution1(): for i in institution: os.system(r"python E:\datax\bin\datax.py E:\datax\job\institution\\" + i) def investment1(): for i in investment: os.system(r"python E:\datax\bin\datax.py E:\datax\job\investment\\" + i) def main(): t1 = threading.Thread(target=credit1) t2 = threading.Thread(target=manage1) t3 = threading.Thread(target=zcgl1) t4 = threading.Thread(target=huresources1) t5 = threading.Thread(target=retail1) t6 = threading.Thread(target=industclient1) t7 = threading.Thread(target=institution1) t8 = threading.Thread(target=investment1) # t1.start() t2.start() t3.start() t4.start() t5.start() t6.start() t7.start() t8.start() # if __name__ == '__main__': main()
之后写了个bat
内容python start.py
2.windows 中有个任务执行计划
over