Linux正则表达式详解

原创

Rickyyan 2018-05-16 08:23:54 博主文章分类：linux ©著作权

©著作权归作者所有：来自51CTO博客作者Rickyyan的原创作品，请联系作者获取转载授权，否则将追究法律责任

Linux正则表达式 1、组成 普通字符：普通字符串，没有特殊含义特殊字符：在正则表达式中具有特殊的含义 正则表达式中常见的meta字符【特殊字符】 2、POSIX BRE【基本】与ERE【扩展】中都有的meta字符 \ ：通常用于打开或关闭后续字符的特殊含义，如(...)【\是转义字符，去掉符号的特殊意义，()、{}等在shell中都有特殊的意义】 *.和*以及.的区别：

[root@localhost ~]# cat -n test.txt 
     1  gd
     2  god
     3
     4  good
     5  goood
     6  goad
     7
     8  gboad

2.1、. ：匹配任意单个字符（除null，即不能为空）

[root@localhost ~]# grep -n "." test.txt       
1:gd
2:god
4:good
5:goood
6:goad
8:gboad
[root@localhost ~]# grep -n "go.d" test.txt 
4:good
6:goad

2.2、 ：匹配其前字符任意次，如o,可以是没有o或者一个o，也可以是多个o**

[root@localhost ~]# grep -n "*" test.txt 
[root@localhost ~]# grep -n "o*" test.txt 
1:gd
2:god
3:
4:good
5:goood
6:goad
7:
8:gboad
[root@localhost ~]# echo "gbad" >>test.txt 
[root@localhost ~]# echo "pbad" >>test.txt 
[root@localhost ~]# echo "kgbad" >>test.txt 
[root@localhost ~]# echo "poad" >>test.txt   
[root@localhost ~]# grep -n "go*" test.txt 【o可以没有，o前面的g一定要匹配】
1:gd
2:god
4:good
5:goood
6:goad
8:gboad
9:gbad
11:kgbad

2.3、. ：匹配任意字符（匹配所有），可以为空*

[root@localhost ~]# grep -n ".*" test.txt 
1:gd
2:god
3:
4:good
5:goood
6:goad
7:
8:gboad
9:gbad
10:pbad
11:kgbad
12:poad
[root@localhost ~]# grep -n "go.*" test.txt 
2:god
4:good
5:goood
6:goad
[root@localhost ~]# grep -n "po.*" test.txt  
12:poad
[root@localhost ~]# echo "pgoad" >>test.txt    
[root@localhost ~]# grep -n "go.*" test.txt  【匹配go后存在任意字符，可为空】
2:god
4:good
5:goood
6:goad
13:pgoad
[root@localhost ~]# 
[root@localhost ~]# grep -n "o.*" test.txt  
2:god
4:good
5:goood
6:goad
8:gboad
12:poad

2.4、^ ：匹配紧接着后面的正则表达式，以...为开头

[root@localhost tmp]# grep "^root" /etc/passwd
root:x:0:0:root:/root:/bin/bash
[root@localhost tmp]#
2.5、$ ：匹配紧接着前面的正则表达式，以...结尾
[root@localhost tmp]# grep "bash$" /etc/passwd | head -1
root:x:0:0:root:/root:/bin/bash
[root@localhost tmp]#

^$：表示是空行的意思 “#|^$”：匹配以#号开头的注释行和空行 2.6、[] ：匹配方括号里的任一字符

(如[sS]，匹配s或匹配S)，其中可用连字符（-）指定连字符的范围（如[(0-9)]，匹配0-9任一字符）；[^0-9]如果^符号出现在方括号的第一个位置，则表示匹配不在列表中的任一字符。
[root@localhost tmp]# cat hosts 
192.168.200.1
192.168.200.3
a.b.123.5
23.c.56.1
1456.1.2.4
12.4.5.6.8
[root@localhost tmp]# grep -E '([0-9]{1,3}\.){3}[0-9]{1,3}' hosts   
192.168.200.1
192.168.200.3
1456.1.2.4
12.4.5.6.8
[root@localhost tmp]# grep -E '^([0-9]{1,3}\.){3}[0-9]{1,3}$' hosts 
192.168.200.1
192.168.200.3
[root@localhost tmp]#

2.7、? ：匹配前面字符的零次或多次

[root@localhost ~]# grep -E "go?d" test.txt   
gd
god
[root@localhost ~]# 
[root@localhost tmp]# cat test
do
does
doxy
[root@localhost tmp]# grep -E "do(es)?" test 
do
does
doxy
[root@localhost tmp]#

3、POSIX BRE（基本正则）中才有的字符 {n,m} ：区间表达式，匹配在它前面的单个字符重现【重复，紧接着的单个字符如https{0,1},即重复s 0-1次。{n}指匹配n次；{n,m}指匹配n至m次，{n,}指匹配至少n次，{,m}匹配至多m次。【\转义字符】 4、POSIX ERE(扩展正则)中才有的字符 4.1、{n,m} ：与BRE的{n,m}功能相同

[root@localhost tmp]# grep -E '^([0-9]{1,3}\.){3}[0-9]{1,3}$' hosts 
192.168.200.1
192.168.200.3

4．2、+ ：匹配前面正则表达式的一次或多次

[root@localhost ~]# egrep "go+d" test.txt 
god
good
goood
[root@localhost ~]#
4.3、| ：表示匹配多个字符串【或的关系】
[root@localhost ~]# grep -E "3306|1521" /etc/services 
mysql           3306/tcp                        # MySQL
mysql           3306/udp                        # MySQL
ncube-lm        1521/tcp                # nCube License Manager
ncube-lm        1521/udp                # nCube License Manager
[root@localhost ~]#

4.4、( ) ：分组过滤，后向引用

分组过滤	
[root@localhost ~]# echo "glad" >> test.txt 
[root@localhost ~]# egrep "(la|oo)" test.txt 
good
goood
glad

（）后向引用；当前面匹配部分用小括号的时候，第一个括号的内容可以在后面部分用\1输出；以此类推。
 [root@localhost tmp]# ifconfig |sed -rn 's#.*addr:(.*)(B.*)$#\1#gp'
192.168.4.27

5、正则表达式的元字符 5．1、\b ：匹配一个单词边界

[root@localhost tmp]# cat test        
do
does
doxy
agdoeg
[root@localhost tmp]# grep "do\b" test
do
[root@localhost tmp]# grep "\bdo" test        
do
does
doxy
[root@localhost tmp]# grep "\bdoes" test          
does
[root@localhost tmp]# grep "\bdo\b" test  
do
[root@localhost tmp]#

5．2、\B ：匹配非单词边界，与\b相反

[root@localhost tmp]# grep "do\B" test    
does
doxy
agdoeg
[root@localhost tmp]# grep "do\b" test 
do
[root@localhost tmp]#

5.3、\d ：匹配一个数字字符，等价于[0-9]

5.4、\D ：匹配一个非数字字符，等价于[^0-9]

5.5、\w ：匹配字母、数字、下划线，等价于[A-Za-z0-9_]

还有很多元字符，这里就不一一罗列出来

案例：开机精简 [root@localhost ~]# chkconfig --list| egrep -v "crond|network|rsyslog|sshd|sysstat" | awk '{print "chkconfig",$1,"off"}'|bash

使用正则表达式三剑客总结： grep：以行为对象，简单的过滤文本行的命令，无法做区域定位 sed：以行为对象，简单的增删改查过滤命令，因为存在sg替换模式，所以可以做简单的区域定位（字符串定位），有些定位可能会比较复杂。 awk：以记录（记录行）为单位，因为存在字段概念，所以可以很好的做区域定位，是一门语言，对文本的操作能力非常强大（内置变量、语句块、数组、函数）