案例1:
文本数据:
2013-03-27 15:30:23 [com.xxxxx.custinfo.sms.20130300000]-[INFO]
      receive:13336789456,msg:您好,这是一个测试案例数据1,smsid:23587221
2013-03-27 15:30:29 [com.xxxxx.custinfo.sms.20130300001]-[INFO]
      receive:13336789456,msg:您好,这是一个测试案例数据2,smsid:23587256
2013-03-27 15:31:23 [com.xxxxx.custinfo.sms.20130300001]-[INFO]
      receive:18918089456,msg:您好,这是一个测试案例数据2,smsid:23587256


预期显示结果:
2013-03-27 15:30:23 ,13336789456,您好,这是一个测试案例数据1
2013-03-27 15:30:29 ,13336789456,您好,这是一个测试案例数据2

解答:
1.sed -n -re '/custinfo.sms/N;s/([^\[]*) \[.*receive:([0-9]*),msg:(.*),smsid:[0-9]+$/\1,\2,\3/p' file

2.cat file | sed -e 'N;s/\n//' | awk -F '[ :|,]+' '{print $1,$2":"$3":"$4","$7","$9","$10}'

3.awk '{getline a;sub(/^[^0-9]+/,"",a);sub(/msg:/,"",a);sub(/,smsid.*$/,"",a);if(a~/13336789456/)print $1,$2","a}' file

4.sed 'N;s/\[[^:]*:/,/;s/msg://;s/,smsid.*//' file

5.awk -F'[:,]' '/receive/{print a[1]","$2","$4","$5}{split($0,a,"[")}' file



案例2:

文本数据:

{'items': [{'count': 182, 'rate': 0.00050000000000000001, 'abnormalCountOfRecord': '240', 'normalCountOfRecord': '358202', 'description_of_error_code': u'NO_AUDIO_DATA, \u6ca1\u6709\u68c0\u6d4b\u5230\u97f3\u9891\u6570\u636e\uff0c\u9ea6\u514b\u98ce\u88ab\u7981\u6b62', 'totalCountOfRecord': '358442', 'error_code': '40031', 'totalCountOfUser': '6631'}, {'count': 358202, 'rate': 0.×××9999999999997, 'abnormalCountOfRecord': '240', 'normalCountOfRecord': '358202', 'description_of_error_code': u'\u6210\u529f', 'totalCountOfRecord': '358442', 'error_code': 'suc', 'totalCountOfUser': '6631'}, {'count': 58, 'rate': 0.00020000000000000001, 'abnormalCountOfRecord': '240', 'normalCountOfRecord': '358202', 'description_of_error_code': u'\u5fc5\u9700\u7684\u5185\u6838\u53c2\u6570\u9519\u8bef', 'totalCountOfRecord': '358442', 'error_code': '40086', 'totalCountOfUser': '6631'}]}

如上文本,目标是只提取部分内容,格式如下
'count': 182, 'error_code': '40031'
'count': 358202, 'error_code': 'suc'
'count': 58,'error_code': '40086'


解答:

1.grep -o ".count[^,]\+,\|'error_code[^,]\+" file|sed 'N;s/\n//'


2.awk -F '[{,]' 'BEGIN{RS="totalCountOfUser"}{for(i=1;i<=NF;i++)if($i~/count/)printf $i","$(NF-1)"\n"}' file


3.grep -P -e '.count.{1,2}\W\d+|.error\S[a-z]{1,4}.{2}\W.\d{5}|.error\S[a-z]{1,4}.{2}\W.[a-z]{3}' -o|sed 'N;s/\n/ /'



案例3:

文本数据:

将<site>与</site>将的字段
faongentgne
g
aqeotgentgn4e
取出。
<site>
faongentgne
g
aqeotgentgn4e
</site>
gaegoengne
faeognentge
gaegneg
gandeogneg
gaoengentg4egengen
gaoengeng


解答:

1.sed  '/<site>/,/<\/site>/!d'


2.sed -n '/<site>/{n;h;:1;n;/<\/site>/!{H;b1};x;p}' file


3.awk '/<site>/{a=1;next};a==1&&$0!~/\/site/;/<\/site>/{a=0;next}'



案例4:

文本数据:

#ifconfig |grep 'inet addr'|grep -v '127.0.0.1'|awk '{print $2}'|tr -d '[a-z,A-Z:]'
192.168.1.8
192.168.1.57
192.168.1.9
192.168.1.211

将提取内容横向打印:
192.168.1.8,192.168.1.57,192.168.1.9,192.168.1.211


解答:

1.grep -oP '(?<=addr:)(\d+\.)+\d+(?=  B)'|xargs


2.ifconfig | sed -nr ':1 N;/\n$/!b1;/eth/{s/.*r:(\S+)\s+B.*/\1/;H};${x;s/\n//;s/\n/,/gp}'



案例5

将a b两个文件内不同的内容保存到c

解答:

join -v2 a b >c