《Linux Shell脚本攻略》读书笔记第四章让文本飞

原创

stonebox 2013-12-18 12:48:59 博主文章分类：Linux ©著作权

©著作权归作者所有：来自51CTO博客作者stonebox的原创作品，请联系作者获取转载授权，否则将追究法律责任

1、正则表达式

正则表达式包括：文字字符，通配符，修饰符，锚点。

正则表达式字符类（POSIX)：

表达式	字符类	ASCII中的对应域
[:alnum:]	字母数字混排	A-Za-z0-9
[:alpha:]	字母字符	A-Za-z
[:blank:]	空格符或者制表符
[:digit:]	数字	0-9
[:lower:]	小写字母	a-z
[:punct:]	可打印字符，不包括空格和字母数字混排字符
[:space:]	空白字符
[:upper:]	大写字符	A-Z

正则表达式语法总结：

字符	功能	正则表达式语法	解释
.	通配符	基本	代表一个任意字符
[abc],[a-z]	包含域	基本	代表域内任意一个字符
[^abc],[^a-z]	排除范围	基本	代表不包含在域内的任意一个字符
?	修饰符	扩展	代表0或者1个前面的项
*	修饰符	基本	代表0或者多个前面的项
+	修饰符	扩展	代表1或者多个前面的项
{m,n}	修饰符	扩展	代表前面的项出现了m到n次之间
{n}	修饰符	扩展	代表前面的项具体出现的次数为n
^	锚	基本	标出一行的开始
$	锚	基本	标出一行的结束
\<	锚	基本	标出一个单词的开始
\>	锚	基本	标出一个单词的结束
(...)	分组	基本	允许修饰符修饰一组字符
(...\|...)	分组	扩展	允许指定可选的模式
\	转义	扩展（基本）	取消（或者启动）后续字符的特殊含义

元字符

正则表达式	描述	示例
\b	单词边界	\bcool\b匹配cool,不匹配cooler
\B	非单词边界	cool\B匹配cooler，不匹配cool
\d	单个数字字符	b\db匹配b2b,不匹配bcb
\D	单个非数字字符	b\Db匹配bcb,不匹配b2b
\w	单个单词字符（字母、数字与_)	\w匹配1或a，不匹配&
\W	单个非单词字符	\W匹配&，不匹配1或a
\s	单个空白字符	x\sx匹配x x，不匹配xx
\S	单个非空白字符	x\Sx匹配xkx，不匹配x x
\n	换行符	\n匹配一个新行
\r	回车	\r匹配回车

[root@stone ~]# egrep "( ?[a-zA-Z]+ ?)" num2

banana

cherry

grape

orange

[root@stone ~]# ifconfig | egrep "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"

inet addr:172.16.3.54 Bcast:172.16.3.255 Mask:255.255.255.0

inet addr:192.168.0.100 Bcast:192.168.0.255 Mask:255.255.255.0

inet addr:127.0.0.1 Mask:255.0.0.0

inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0

2、搜索文本 grep egrep

grep仅可以使用基本正则表达式，egrep才可以使用扩展正则表达式

grep -E 等同于egrep

grep 命令通用命令行选项

选项	作用
-c	仅打印出包含模式的行的数量
-h	禁用文件名前缀
-e 表达式	使用表达式作为搜索模式（有助于指定多个模式）
-i	在判断模式时否匹配时忽略大小写
-l	仅打印符合模式要求的文件名
-L	仅打印不符合模式要求的文件名
-n	包括行数在内的符合要求的行
-q	“Quiet”。不要写任何标准输出，只要找到匹配行，返回值为0
-r	在目录中递归搜索所有文件
-w word	仅包含word的行
-A N	包括符合要求的那一行的后N行的内容
-B N	包括符合要求的那一行的前N行的内容
-C N	包括符合要求的那一行的前后N行的内容
-o	只打印匹配的文本
-f file	使用file中指定的模式进行匹配
--exclude file	不进行搜索的文件或目录，可用通配符匹配
--exclude-from file	不进行搜索file中指定的文件列表
-Z	输出nul字符，通常与-l配合使用

[root@stone ~]# ifconfig | egrep -o "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"

172.16.3.54

172.16.3.255

255.255.255.0

192.168.0.100

192.168.0.255

255.255.255.0

127.0.0.1

255.0.0.0

192.168.122.1

192.168.122.255

255.255.255.0

#仅打印匹配的文本

[root@stone ~]# ifconfig | egrep -o "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" | wc -l

#统计匹配的数量

[root@stone ~]# ifconfig | egrep "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" -c

#统计匹配的行数

[root@stone ~]# ifconfig | egrep "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" -n

2: inet addr:172.16.3.54 Bcast:172.16.3.255 Mask:255.255.255.0

11: inet addr:192.168.0.100 Bcast:192.168.0.255 Mask:255.255.255.0

15: inet addr:127.0.0.1 Mask:255.0.0.0

41: inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0

#列出匹配的行的行号

[root@stone ~]# grep "grape" . -r -n

./num.patch:8:+grape

./num1.orig:3:grape

./num2:3:grape

#递归搜索当前目录下所有含有”grape“的，并列出行号文件

[root@stone ~]# echo hello world | grep -i "HELLO"

hello world

#-i选项忽略大小写

[root@stone ~]# echo -e "apple\norange" | grep -e "apple" -e "orange"

apple

orange

#-e选项匹配多个模式

[root@stone ~]# echo -e "apple\norange" > pattenfiel

[root@stone ~]# cat num2

banana

cherry

grape

orange

[root@stone ~]# grep -f num2 pattenfiel

orange

#-f选项指定模式文件

[root@stone ~]# grep "orange" num* -lZ

num1num1.orignum2num.patch

[root@stone ~]# grep "orange" num* -lZ | od -a

0000000 n u m 1 nul n u m 1 . o r i g nul n

0000020 u m 2 nul n u m . p a t c h nul

0000036

[root@stone ~]# grep "orange" num* -lZ | xargs -0 ls

num1 num1.orig num2 num.patch

#-lZ选项输出nul字符，配合xargs -0 使用可规避含有空格的文件名

[root@stone ~]# seq 5 | grep 3 -A 2

#-A选项输出匹配行及后两行

[root@stone ~]# seq 5 | grep 3 -B 2

#-B选项输出匹配行及前两行

[root@stone ~]# seq 5 | grep 3 -C 2

#-C选项输出匹配行及前后两行

3、切分文件 cut

[root@stone ~]# cat student_data.txt

No Name Mark Percent

1 sarath 45 90

2 alex 49 98

3 anu 45 90

[root@stone ~]# cut -f1 student_data.txt

#提取第一列

[root@stone ~]# cut -f2,4 student_data.txt

Name Percent

sarath 90

alex 98

anu 90

#提取第2,4列

[root@stone ~]# cut -f3 --complement student_data.txt

No Name Percent

1 sarath 90

2 alex 98

3 anu 90

#提取除第3列外所有列

[root@stone ~]# cat student_data.txt

student mark

No Name Mark Percent

1 sarath 45 90

2 alex 49 98

3 anu 45 90

end

[root@stone ~]# cut -s -f1 student_data.txt

#排除不含有定界符的行（默认定界符是tab）

[root@stone ~]# cat student_data.txt

student mark

No,Name,Mark,Percent

1,sarath,45,90

2,alex,49,98

3,anu,45,90

end

[root@stone ~]# cut -s -f2 -d "," student_data.txt

Name

sarath

alex

anu

#-d选项指定定界符

[root@stone ~]# echo 123456789 > rang_fields.txt

[root@stone ~]# cut -c1-5 rang_fields.txt

12345

[root@stone ~]# cut -c-5 rang_fields.txt

12345

[root@stone ~]# cut -c2-5 rang_fields.txt

2345

[root@stone ~]# cut -c5- rang_fields.txt

56789

#指定字段范围

[root@stone ~]# cut -c1-3,7-9 --output-delimiter="," rang_fields.txt

123789

#--output-delimiter设置输出分隔符，但例子没有这个效果

4、统计词频

[root@stone ~]# egrep -o "\b[[:alpha:]]+\b" student_data.txt

student

mark

Name

Mark

Percent

sarath

alex

anu

end

[root@stone ~]# egrep -o "\b[[:alpha:]]+\b" student_data.txt | sort | uniq -c | sort -rn

1 student

1 sarath

1 Percent

1 No

1 Name

1 Mark

1 mark

1 end

1 anu

1 alex

5、sed入门

[root@stone ~]# cat student_data.txt

student mark

No,Name,Mark,Percent

1,sarath,45,90

2,alex,49,98

3,anu,45,90

end

[root@stone ~]# sed 's/,/\t/' student_data.txt

student mark

No Name,Mark,Percent

1 sarath,45,90

2 alex,49,98

3 anu,45,90

end

#"s///"只替换每一行第一处匹配字符

[root@stone ~]# sed 's/,/\t/g' student_data.txt

student mark

No Name Mark Percent

1 sarath 45 90

2 alex 49 98

3 anu 45 90

end

#"s///g"替换所有匹配字符

[root@stone ~]# sed 's/,/\t/3g' student_data.txt

student mark

No,Name,Mark Percent

1,sarath,45 90

2,alex,49 98

3,anu,45 90

end

#"s///ng"从第n处开始替换所有匹配字符

[root@stone ~]# sed -i 's/,/\t/g' student_data.txt

[root@stone ~]# cat student_data.txt

student mark

No Name Mark Percent

1 sarath 45 90

2 alex 49 98

3 anu 45 90

end

#-i选项表示将替换应用到原始文件

[root@stone ~]# cat student_data.txt

student mark

No Name Mark Percent

1 sarath 45 90

2 alex 49 98

3 anu 45 90

end

[root@stone ~]# sed '/^$/d' student_data.txt

student mark

No Name Mark Percent

1 sarath 45 90

2 alex 49 98

3 anu 45 90

end

#‘/^$/d'表示删除空白行

[root@stone ~]# sed 's/\w\+/[&]/g' student_data.txt

[student] [mark]

[No] [Name] [Mark] [Percent]

[1] [sarath] [45] [90]

[2] [alex] [49] [98]

[3] [anu] [45] [90]

[end]

#&表示已经匹配的字符串，类似find和xargs中的{}，\w\+匹配每一个单词

[root@stone ~]# echo this is digit 7 in a number | sed 's/digit $[0-9]$/\1/'

this is 7 in a number

[root@stone ~]# echo first FIRST | sed 's/$[a-z]\+$ $[A-Z]\+$/\2 \1/'

FIRST first

[root@stone ~]# echo first FIRST | sed 's/$\w\+$ $\w\+$/\2 \1/'

FIRST first

#$pattern$用于匹配子串，匹配的第一个字符串标记为\1，第二个标记为\2

[root@stone ~]# sed 's/$\w\+$/[\1]/g' student_data.txt

[student] [mark]

[No] [Name] [Mark] [Percent]

[1] [sarath] [45] [90]

[2] [alex] [49] [98]

[3] [anu] [45] [90]

[end]

[root@stone ~]# text=hello

[root@stone ~]# echo hello world | sed "s/$text/HELLO/"

HELLO world

#档表达式中有变量时，需要使用双引号

6、awk入门

工作原理

[root@stone ~]# echo -e "line1\nline2" | awk 'BEGIN{print "start"}{print}END{print "End"}'

start

line1

line2

End

#print不带参数时会打印当前行

[root@stone ~]# echo | awk '{var1="v1";var2="v2";var3="v3";print var1,var2,var3}'

v1 v2 v3

#print的参数以“，”（逗号）分割时，参数打印时则以空格为分界符

#awk中对变量赋值时，变量的值需要使用双引号括起来

#awk中对变量的引用，不需要使用$

[root@stone ~]# echo | awk '{var1="v1";var2="v2";var3="v3";print var1"-"var2"-"var3}'

v1-v2-v3

#在awk的print语句中，如果需要使用拼接符，则需要用双引号括起来

[root@stone ~]# echo | awk '{var1="v1";var2="v2";var3="v3";print var1-var2-var3}'

特殊变量
NR：表示记录数量（number of records），在执行过程中对应与当前行号

NF：表示字段数量（number of fields），在执行过程中对应于当前行的字段总数

$0：这个变量包含执行过程中当前行的文本内容

$1：这个变量包含第一个字段的文本内容

$2：这个变量包含第二个字段的文本内容

[root@stone ~]# cat rang_fields.txt

line1 f1 f2

line2 f3 f4

line3 f5 f6

[root@stone ~]# cat rang_fields.txt | awk '{print "line_no:"NR,"field_no:"NF,"$0="$0,"$1="$1,"$2="$2,"$3="$3}'

line_no:1 field_no:3 $0=line1 f1 f2 $1=line1 $2=f1 $3=f2

line_no:2 field_no:3 $0=line2 f3 f4 $1=line2 $2=f3 $3=f4

line_no:3 field_no:3 $0=line3 f5 f6 $1=line3 $2=f5 $3=f6

[root@stone ~]# cat rang_fields.txt | awk '{print $NF,$(NF-1)}'

f2 f1

f4 f3

f6 f5

#print $NF打印一行中最后一个字段

#print $(NF-1)打印一行中倒数第二个字段

[root@stone ~]# awk '{print $3,$2}' rang_fields.txt

f2 f1

f4 f3

f6 f5

[root@stone ~]# awk 'END{print NR}' rang_fields.txt

#统计文件的行数，实际上是打印最后一行的行号

[root@stone ~]# awk '{print NR}' rang_fields.txt

[root@stone ~]# seq 5 | awk 'BEGIN{sum=0;print "summation:"}{print $1"+";sum+=$1}END{print "==";print sum}'

summation:

外部变量传递

[root@stone ~]# var=10000

[root@stone ~]# echo | awk -v var1=$var '{print var1}'

10000

#-v选项可以将外部值传递给awk

[root@stone ~]# var1=10;var2=20

[root@stone ~]# echo | awk '{print v1,v2}' v1=$var1 v2=$var2

10 20

[root@stone ~]# awk '{print v1,v2}' v1=$var1 v2=$var2 rang_fields.txt

10 20

#变量之间以空格分隔，在BEGIN,{}和END语句块之后

getline读取行

[root@stone ~]# seq 5 | awk 'BEGIN{getline;print "aaa",$0}{print $0}'

aaa 1

[root@stone ~]# seq 5 | awk 'BEGIN{print "aaa",$0}{print $0}'

aaa

行过滤

[root@stone ~]# seq 5 | awk 'NR<=3{print $0}'

#读取行号小于等于3的行

[root@stone ~]# seq 5 | awk 'NR==3{print $0}'

#读取行号等于3的行

[root@stone ~]# seq 5 | awk '/3/{print $0}'

#读取内容包含3的行

[root@stone ~]# seq 5 | awk '!/3/{print $0}'

#读取内容不包括3的行

设置定界符

[root@stone ~]# head -5 /etc/passwd | awk -F ":" '{print $NF}'

/bin/bash

/sbin/nologin

[root@stone ~]# head -5 /etc/passwd | awk 'BEGIN{FS=":"}{print $NF}'

/bin/bash

/sbin/nologin

读取命令输出

[root@stone ~]# echo | awk '{"grep root /etc/passwd"|getline cmdout;print cmdout}'

root:x:0:0:root:/root:/bin/bash

#将命令的输出读入变量output的语法为：“command”|getline output

在awk中使用循环

[root@stone ~]# cat rang_fields.txt

line1 f1 f2

line2 f3 f4

line3 f5 f6

[root@stone ~]# awk '{for(i=1;i<=NF;i++) {print $i}}' rang_fields.txt

line1

line2

line3

awk内建字符串控制函数

7、文件行，单词，字符选取

选取每一行

重定向方式：

[root@stone ~]# while read line; do echo $line; done < rang_fields.txt

line1 f1 f2

line2 f3 f4

line3 f5 f6

子shell方式：

[root@stone ~]# cat rang_fields.txt | ( while read line;do echo $line;done )

line1 f1 f2

line2 f3 f4

line3 f5 f6

选取每一个单词

[root@stone ~]# while read line;

> do

> for word in $line;

> do

> echo $word;

> done

> done < rang_fields.txt

line1

line2

line3

选取每一个字符

[root@stone ~]# word=abcdefg

[root@stone ~]# for((i=0;i<${#word};i++)); do echo ${word:i:1}; done

8、按列合并文件 paste

[root@stone ~]# cat num1

apple

banana

mango

orange

peach

pear

strawberry

watermelon

[root@stone ~]# cat num2

banana

cherry

grape

orange

[root@stone ~]# paste num1 num2

apple banana

banana cherry

mango grape

orange orange

peach

pear

strawberry

watermelon

[root@stone ~]# paste num1 num2 -d ":"

apple:banana

banana:cherry

mango:grape

orange:orange

peach:

pear:

strawberry:

watermelon:

#-d选项指定分隔符，默认是制表符

9、打印文件第n列

[root@stone ~]# cat rang_fields.txt

line1 f1 f2

line2 f3 f4

line3 f5 f6

[root@stone ~]# awk '{print $3}' rang_fields.txt

[root@stone ~]# cut -d " " -f 3 rang_fields.txt

10、打印文件某些行

[root@stone ~]# cat num1

apple

banana

mango

orange

peach

pear

strawberry

watermelon

[root@stone ~]# awk 'NR==3,NR==5' num1

mango

orange

peach

[root@stone ~]# seq 9 | awk 'NR==4,NR==6'

[root@stone ~]# awk '/ba.*a/,/^p/' num1

banana

mango

orange

peach

[root@stone ~]# awk '/ba.*a/,/h$/' num1

banana

mango

orange

peach

11、检查回文字符串

[root@stone ~]# echo -e "aa\nbc"|sed -n '/$.$\1/p'

[root@stone ~]# cat bin/match_palindrome.sh

#!/bin/bash

if [ $# -ne 2 ];then

echo "usage: $0 filename string_length"

exit -1

filename=$1

basepattern='/^$.$'

#开头一个字符

count=$(($2/2))

#字符数一半

for((i=1;i<$count;i++))

basepattern=$basepattern'$.$';

done

#设定匹配模式

if [ $(($2%2)) -ne 0 ];then

basepattern=$basepattern'.';

#考虑字符总数为奇数的情况

for((count;count>0;count--))

basepattern=$basepattern'\'"$count";

done

#反向引用并拼接

basepattern=$basepattern'/p'

sed -n "$basepattern" $filename

[root@stone ~]# match_palindrome.sh num2 5

stats

napan

[root@stone ~]# cat num2

banana

cherry

grape

orange

peep

noon

stats

napan

oppo

[root@stone ~]# match_palindrome.sh num2 4

peep

noon

oppo

[root@stone ~]# match_palindrome.sh num2 5

stats

napan

rev命令

[root@stone ~]# string="malayalam"

[root@stone ~]# if [ "$string" == "$(echo $string | rev )" ];then echo "palindrome" ;else echo "not palindrome" ;fi

palindrome

[root@stone ~]# sentence='i love my country'

[root@stone ~]# echo $sentence | rev | tr ' ' '\n' | tac | tr '\n' ' ' | rev

country my love i

12、以逆序形式打印

[root@stone ~]# seq 5 | tac

[root@stone ~]# seq 5 | awk '{lifo[NR]=$0;lno=NR}END{for(;lno>-1;lno--){print lifo[lno];}}'

13、解析email地址和URL

[root@stone ~]# man echo | egrep -o '[A-Za-z0-9.]+@[A-Za-z0-9.]+\.[a-zA-Z]{2,4}'

coreutils@gnu.org

[root@stone ~]# man echo | egrep -o 'http://[A-Za-z0-9.]+\.[a-zA-Z]{2,4}'

http://www.gnu.org

14、移除包含特定单词的句子

[root@stone ~]# cat regular_express.txt

"Open Source" is a good mechanism to develop programs.

apple is my favorite food.

# I am VBird

[root@stone ~]# cat regular_express.txt | sed 's/[^.]*food[^.]*\.//g'

"Open Source" is a good mechanism to develop programs.

# I am VBird

15、替换变量内容与变量子串

[root@stone ~]# var="this is a line of text"

[root@stone ~]# echo ${var/line/word}

this is a word of text

#使用word替换line

[root@stone ~]# seq=1234567890

[root@stone ~]# echo ${seq:4}

567890

#获取第4个字符之后的所有字符

[root@stone ~]# echo ${seq:4:2}

#获取第4个字符之后的2个字符

[root@stone ~]# echo ${seq:(-1)}

#获取倒数第1 个字符

[root@stone ~]# echo ${seq:(-3):2}

#获取倒数第3个字符后的两个字符

上一篇：《Linux Shell脚本攻略》读书笔记第三章以文件之名

下一篇：《Linux Shell脚本攻略》读书笔记第五章一网情深

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

《Linux Shell脚本攻略》读书笔记第四章 让文本飞

《Linux Shell脚本攻略》读书笔记第四章 让文本飞

51CTO博客

《Linux Shell脚本攻略》读书笔记第四章让文本飞

《Linux Shell脚本攻略》读书笔记第四章让文本飞