案例1:
文本数据:
2013-03-27 15:30:23 [com.xxxxx.custinfo.sms.20130300000]-[INFO]
receive:13336789456,msg:您好,这是一个测试案例数据1,smsid:23587221
2013-03-27 15:30:29 [com.xxxxx.custinfo.sms.20130300001]-[INFO]
receive:13336789456,msg:您好,这是一个测试案例数据2,smsid:23587256
2013-03-27 15:31:23 [com.xxxxx.custinfo.sms.20130300001]-[INFO]
receive:18918089456,msg:您好,这是一个测试案例数据2,smsid:23587256
预期显示结果:
2013-03-27 15:30:23 ,13336789456,您好,这是一个测试案例数据1
2013-03-27 15:30:29 ,13336789456,您好,这是一个测试案例数据2
解答:
1.sed -n -re '/custinfo.sms/N;s/([^\[]*) \[.*receive:([0-9]*),msg:(.*),smsid:[0-9]+$/\1,\2,\3/p' file
2.cat file | sed -e 'N;s/\n//' | awk -F '[ :|,]+' '{print $1,$2":"$3":"$4","$7","$9","$10}'
3.awk '{getline a;sub(/^[^0-9]+/,"",a);sub(/msg:/,"",a);sub(/,smsid.*$/,"",a);if(a~/13336789456/)print $1,$2","a}' file
4.sed 'N;s/\[[^:]*:/,/;s/msg://;s/,smsid.*//' file
5.awk -F'[:,]' '/receive/{print a[1]","$2","$4","$5}{split($0,a,"[")}' file
案例2:
文本数据:
{'items': [{'count': 182, 'rate': 0.00050000000000000001, 'abnormalCountOfRecord': '240', 'normalCountOfRecord': '358202', 'description_of_error_code': u'NO_AUDIO_DATA, \u6ca1\u6709\u68c0\u6d4b\u5230\u97f3\u9891\u6570\u636e\uff0c\u9ea6\u514b\u98ce\u88ab\u7981\u6b62', 'totalCountOfRecord': '358442', 'error_code': '40031', 'totalCountOfUser': '6631'}, {'count': 358202, 'rate': 0.×××9999999999997, 'abnormalCountOfRecord': '240', 'normalCountOfRecord': '358202', 'description_of_error_code': u'\u6210\u529f', 'totalCountOfRecord': '358442', 'error_code': 'suc', 'totalCountOfUser': '6631'}, {'count': 58, 'rate': 0.00020000000000000001, 'abnormalCountOfRecord': '240', 'normalCountOfRecord': '358202', 'description_of_error_code': u'\u5fc5\u9700\u7684\u5185\u6838\u53c2\u6570\u9519\u8bef', 'totalCountOfRecord': '358442', 'error_code': '40086', 'totalCountOfUser': '6631'}]}
如上文本,目标是只提取部分内容,格式如下
'count': 182, 'error_code': '40031'
'count': 358202, 'error_code': 'suc'
'count': 58,'error_code': '40086'
解答:
1.grep -o ".count[^,]\+,\|'error_code[^,]\+" file|sed 'N;s/\n//'
2.awk -F '[{,]' 'BEGIN{RS="totalCountOfUser"}{for(i=1;i<=NF;i++)if($i~/count/)printf $i","$(NF-1)"\n"}' file
3.grep -P -e '.count.{1,2}\W\d+|.error\S[a-z]{1,4}.{2}\W.\d{5}|.error\S[a-z]{1,4}.{2}\W.[a-z]{3}' -o|sed 'N;s/\n/ /'
案例3:
文本数据:
将<site>与</site>将的字段
faongentgne
g
aqeotgentgn4e
取出。
<site>
faongentgne
g
aqeotgentgn4e
</site>
gaegoengne
faeognentge
gaegneg
gandeogneg
gaoengentg4egengen
gaoengeng
解答:
1.sed '/<site>/,/<\/site>/!d'
2.sed -n '/<site>/{n;h;:1;n;/<\/site>/!{H;b1};x;p}' file
3.awk '/<site>/{a=1;next};a==1&&$0!~/\/site/;/<\/site>/{a=0;next}'
案例4:
文本数据:
#ifconfig |grep 'inet addr'|grep -v '127.0.0.1'|awk '{print $2}'|tr -d '[a-z,A-Z:]'
192.168.1.8
192.168.1.57
192.168.1.9
192.168.1.211
将提取内容横向打印:
192.168.1.8,192.168.1.57,192.168.1.9,192.168.1.211
解答:
1.grep -oP '(?<=addr:)(\d+\.)+\d+(?= B)'|xargs
2.ifconfig | sed -nr ':1 N;/\n$/!b1;/eth/{s/.*r:(\S+)\s+B.*/\1/;H};${x;s/\n//;s/\n/,/gp}'
案例5
将a b两个文件内不同的内容保存到c
解答:
join -v2 a b >c