shell脚本编程之正则表达式(一)

一、前言

​ 本文主要讲述shell正则表达式的主要概念和常用“三剑客”之一的grep命令

二、正则表达式的定义

​ 正则表达式:或称正规表达式、常规表达式。是使用单个字符串来描述、匹配一系列符合某个句法规则的字符串,即通过一些特殊符号实现快速查找、删除、替换某个特定字符串。由普通字符和元字符组成的文字模式。

​ 普通字符:如大小写字母、数字、标点符号以及一些其他符号

​ 元字符:具有特殊意义的专用字符,可以用来规定其前导字符(即位于元字符前面的字符)在目标对象中的出现模式。

当然这些概念未免太过抽象而且枯燥,下面的实例或许可以有助于您理解这些抽象的概念。

正则表达式包括:

  • 基础正则表达式——grep与sed支持
  • 扩展正则表达式——egrep与awk支持

三、正则表达式的作用

系统管理员常用,且是必备技能之一。有助于快速定位重要信息,解决相关问题。

四、正则表达式示例

下面结合实例细讲grep命令在正则表达式的作用与使用格式方法,从而引出基本正则表达式所包含的元字符的含义。

grep命令

  1. -n——显示行号
  2. -i——忽略大小写
  3. -v——反向查找
[root@lokott opt]# cat test.txt              //测试文本内容
he was short and fat.
He was wearing a blue polo shirt with black pants. 
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
 google is the best tools for search keyword.
The year ahead will test our political establishment to the limit.
PI=3.141592653589793238462643383249901429
a wood cross!
Actions speak louder than words


#woood #
#woooooood # 
AxyzxyzxyzxyzC
I bet this place is really spooky late at night! 
Misfortunes never come alone/single.
I shouldn't have lett so tast.

1)查找特定字符

[root@lokott opt]# grep -n 'the' test.txt      //显示行号检索含有the的行
4:the tongue is boneless but it breaks bones.12!
5: google is the best tools for search keyword.
6:The year ahead will test our political establishment to the limit.

 
[root@lokott opt]# grep -ni 'the' test.txt              //显示行号,不区分大小写检索含有the的行
3:The home of Football on BBC Sport online.
4:the tongue is boneless but it breaks bones.12!
5: google is the best tools for search keyword.
6:The year ahead will test our political establishment to the limit.

[root@lokott opt]# grep -nv 'the' test.txt     //显示行号,检索不带the的行
1:he was short and fat.
2:He was wearing a blue polo shirt with black pants. 
3:The home of Football on BBC Sport online.
7:PI=3.141592653589793238462643383249901429
8:a wood cross!
9:Actions speak louder than words
10:
11:
12:#woood #
13:#woooooood # 
14:AxyzxyzxyzxyzC
15:I bet this place is really spooky late at night! 
16:Misfortunes never come alone/single.
17:I shouldn't have lett so tast.

2)下面利用中括号[ ]来查找集合字符

[root@lokott opt]# grep -n 'sh[io]rt' test.txt       //检索包含shirt或者short的行
1:he was short and fat.
2:He was wearing a blue polo shirt with black pants. 
[root@lokott opt]# grep -n 'oo' test.txt              //重复字符检索
3:The home of Football on BBC Sport online.
5: google is the best tools for search keyword.
8:a wood cross!
12:#woood #
13:#woooooood # 
15:I bet this place is really spooky late at night! 

[root@lokott opt]# grep -n '[^w]oo' test.txt             //检索oo字符的前导字符为非w的行
3:The home of Football on BBC Sport online.
5: google is the best tools for search keyword.
12:#woood #            //这里匹配的是后面的两个o,其前导字符为第一个o,所以显示的时候这三个o为红色
13:#woooooood #         //这里匹配的是第一个到第6个o
15:I bet this place is really spooky late at night! 
[root@lokott opt]# grep -n '[^a-z]oo' test.txt   //匹配字符串oo前面为非小写字母的行,匹配的是Foo
3:The home of Football on BBC Sport online. 
[root@lokott opt]# grep -n '[0-9]' test.txt       //匹配数字字符
4:the tongue is boneless but it breaks bones.12!
7:PI=3.141592653589793238462643383249901429

3)查找行首^与行尾字符$

行首检索实例:

[root@lokott opt]# grep -n '^the' test.txt                   //检索以the开头的行
4:the tongue is boneless but it breaks bones.12!
[root@lokott opt]# grep -n '^[a-z]' test.txt				//检索以小写字母开头的行 
1:he was short and fat.
4:the tongue is boneless but it breaks bones.12!
8:a wood cross!
[root@lokott opt]# grep -n '^[a-zA-Z]' test.txt               //检索以字母开头的行
1:he was short and fat.
2:He was wearing a blue polo shirt with black pants. 
3:The home of Football on BBC Sport online.
4:the tongue is boneless but it breaks bones.12!
6:The year ahead will test our political establishment to the limit.
7:PI=3.141592653589793238462643383249901429
8:a wood cross!
9:Actions speak louder than words
14:AxyzxyzxyzxyzC
15:I bet this place is really spooky late at night! 
16:Misfortunes never come alone/single.
17:I shouldn't have lett so tast.

[root@lokott opt]# grep -n '^[^a-zA-Z]' test.txt      //检索不是以字母开头的行
5: google is the best tools for search keyword.
12:#woood #
13:#woooooood # 

注意:!这里的^所处的位置所代表的含义是不一样的,在中括号外面的表示取以括号内的内容开头,反之表示以内容取反。

简单来说,16字概括:中括号外,以内开头,中括号内,以内取反。

行尾$检索实例:

[root@lokott opt]# grep -n '\.$' test.txt           //检索以点“.”结尾的行
1:he was short and fat. 
3:The home of Football on BBC Sport online.
5: google is the best tools for search keyword.
6:The year ahead will test our political establishment to the limit.
16:Misfortunes never come alone/single.
17:I shouldn't have lett so tast.
[root@lokott opt]# grep -n '^$' test.txt             //检索出空行 
10:
11:

注意:!!"."代表点的意思的时候,进行查找需要使用" \ " 转义,因为点号“.”也是元字符

4)查找任意一个字符“.”与重复字符“*”

[root@lokott opt]# grep -n 'w..d' test.txt                //检索w和d之间可以是任意两个字符的行
5: google is the best tools for search keyword.
8:a wood cross!
9:Actions speak louder than words
[root@lokott opt]# grep -n 'wo*d' test.txt                 //检索w和d之间o出现0次或者多次的行
8:a wood cross!
12:#woood #
13:#woooooood # 
[root@lokott opt]# grep -n 'ooo*d' test.txt   			//检索第三个o出现0次或者多次的行
8:a wood cross!
12:#woood #
13:#woooooood #
[root@lokott opt]# grep -n 'w.*d' test.txt               //检索w与d之间可有可无的字符的行
1:he was short and fat.
5: google is the best tools for search keyword.
8:a wood cross!
9:Actions speak louder than words
12:#woood #
13:#woooooood # 
[root@lokott opt]# grep -n '[0-9][0-9]*' test.txt      //检索任意数字所在的行
4:the tongue is boneless but it breaks bones.12!
7:PI=3.141592653589793238462643383249901429

5)查找连续字符范围"{}"

​ {}主要作用是为了限制一个范围内重复的字符串,例如查找3-5个o的连续字符即可需要使用{},但是由于在shell中其有特定意义,所以需要利用转义字符“\”,将“{}”字符转换成普通字符。

[root@lokott opt]# grep -n 'o\{2\}' test.txt     //检索连续两个字符“o”的行
3:The home of Football on BBC Sport online.
5: google is the best tools for search keyword.
8:a wood cross!
12:#woood #
13:#woooooood # 
15:I bet this place is really spooky late at night! 
[root@lokott opt]# grep -n 'wo\{2,5\}d' test.txt //检索以w开头d结尾o出现2-5次的行
8:a wood cross!
12:#woood #
[root@lokott opt]# grep -n 'wo\{2,\}d' test.txt   //检索w开头d结尾o出现2次以上的行
8:a wood cross!
12:#woood #
13:#woooooood # 

五、元字符总结

基础正则表达式常见元字符总结

^——上述16字口诀

$——匹配输入字符串结尾位置

.——匹配除了“\r\n”的任何单个字符

\——将下一个字符标记为特殊字符,原意字符、向后引用、八进制转义符。

*——匹配前面的前导字符出现0次或者多次

[]——字符集合。匹配所包含的任意一个字符。

[^]——赋值字符集合。匹配未包含的一个任意字符。

[n1-n2]——字符范围。匹配指定范围内的任意一个字符。

{n}——n为非负整数,匹配确定的n次。

{n,}——n为非负整数,至少匹配n次

{n1,n2}——n1和n2都是非负整数,n1<n2,匹配此时介于n1-n2之间。

六、小结

​ 本文主要是借grep命令引出基础正则表达式的基本概念与用法,介绍了如何使用基础正则表达式以及对元字符进行解释与常用元字符的总结。