需求:
工作中生成的日志是固定格式的大量数据,需要从每个日志中提取固定的几行值,对其汇总,用于统计分析工艺参数。
需要日志内容如图:
解决:
python自带的linecache模块中getline方法简单好用,可直接提取文件中对应行的内容
- #!/usr/bin/python
- # -*- coding: UTF-8 -*-
- #data_filter
- import os,sys,linecache
- info = os.getcwd()
- fout = open('data_filter.txt_', 'w')
- def writeintofile(info):
- fin = open(info)
- need =''
- for lineno in range(360,363): #需要每个日志的360行至363行内容
- need_temp = linecache.getline(info,lineno) #提取对应行内容
- need += need_temp
- data = need + info +'\n'
- strinfo = data
- fout.write(strinfo)
- fin.close()
- for root, dirs, files in os.walk(info):
- if len(dirs) == 0:
- for fl in files:
- info = "%s\%s" % (root,fl)
- if info[-3:]=='txt': #遍历所有txt文本,即所需日志
- writeintofile(info)
- fout.close()
- raw_input('Finished....Write BY Tom \nEnter Exit' )
由于os.walk默认是按日志文件名顺序遍历的,这里需要按日志生成时间提取内容,所以引入os.path.getmtime()方法,将文件创建时间做key,文件名做value定义个字典,将key排序后,输出value。做以下修改,不知道是否有更好方法:
- #!/usr/bin/python
- # -*- coding: UTF-8 -*-
- #data_filter
- import os,sys,linecache
- info = os.getcwd()
- fout = open('data_filter.txt_', 'w')
- d = {} #struct a dictionary save file_time as a key,and filename as a value
- for root, dirs, files in os.walk(info):
- for file in files:
- file_time = os.path.getmtime(file)
- d[file_time] =file
- def writeintofile(info):
- fin = open(info)
- need =''
- for lineno in range(360,363):
- need_temp = linecache.getline(info,lineno)
- need += need_temp
- data = need + info +'\n'
- strinfo = data
- fout.write(strinfo)
- fin.close()
- L = d.keys()
- L.sort() #时间排序
- for file_time in L:
- #print d[file_time] # for test
- if d[file_time][-3:] =='txt':
- writeintofile(d[file_time])
- fout.close()
- raw_input('Finished....Write BY Tom \nEnter Exit' )