零宽断言区别

 

含义

语法

示例

前向搜索肯定模式

零宽度正预测先行断言

匹配exp前面的位置

​(?=exp)​

用​​\b\w+(?=ing\b)​​​查找​​I'm singing while you're dancing.​​​匹配到​​sing danc​

前向搜索否定模式

零宽度负预测先行断言

匹配后面跟的不是exp的位置

​(?!exp)​

​\d{3}(?!\d)​​匹配三位数字,而且这三位数字的后面不能是数字;

​\b((?!abc)\w)+\b​​匹配不包含连续字符串abc的单词

后向搜索肯定模式

零宽度正回顾后发断言

匹配exp后面的位置

​(?<=exp)​

用​​(?<=\bre)\w+\b​​​查找​​reading a book​​​得到​​ading​​。

用​​((?<=\d)\d{3})+\b​​​查找​​1234567890​​​得到​​234567890​​​​(?<=<(\w+)>).*(?=<\/\1>)​​匹配不包含属性的简单HTML标签内里的内容

后向搜索否定模式

零宽度负回顾后发断言

匹配前面不是exp的位置

​(?<!exp)​

​(?<![a-z])\d{7}​​匹配前面不是小写字母的七位数字

他们只匹配一个位置,并不消费任何字符。
带​​​<​​​表示把零宽度(预查)放到要匹配的表达式前面,不带就放到后面。
​​​!​​表示非,不需要的意思。

前向搜索肯定模式例子

# -*-coding:utf-8-*-

import re

address = re.compile(u'((?P<name>([\w.,]+\s+)*[\w.,]+)\s+)(?=(<.*>$)|([^<].*[^>]$))<?(?P<email>[\w\d.+-]+@([\w\d.]+\.)+(com|org|edu))>?', re.VERBOSE)

candidates = [
u'First Last <first.last@example.com>',
u'No Brackets first.last@example.com',
u'Open Bracket <first.last@example.com',
u'Close Bracket first.last@example.com>',
]

for candidate in candidates:
print u'Candidate:', candidate
match = address.search(candidate)
if match:
print u' Name :', match.groupdict()['name']
print u' Email:', match.groupdict()['email']
else:
print ' No match'

输出:

Candidate: First Last <first.last@example.com>
Name : First Last
Email: first.last@example.com
Candidate: No Brackets first.last@example.com
Name : No Brackets
Email: first.last@example.com
Candidate: Open Bracket <first.last@example.com
No match
Candidate: Close Bracket first.last@example.com>
No match

前向搜索否定模式例子

# -*-coding:utf-8-*-

import re

address = re.compile(
'''
^

# An address: username@domain.tld

# Ignore noreply addresses
(?!noreply@.*$)

[\w\d.+-]+ # username
@
([\w\d.]+\.)+ # domain name prefix
(com|org|edu) # limit the allowed top-level domains

$
''',
re.VERBOSE)

candidates = [
u'first.last@example.com',
u'noreply@example.com',
]

for candidate in candidates:
print('Candidate:', candidate)
match = address.search(candidate)
if match:
print(' Match:', candidate[match.start():match.end()])
else:
print(' No match')

输出:

('Candidate:', u'first.last@example.com')
(' Match:', u'first.last@example.com')
('Candidate:', u'noreply@example.com')
No match

后向搜索否定模式例子

# -*-coding:utf-8-*-

import re

pattern = u'^[\w\d\.+-]+(?<!noreply)@([\w\d.]+\.)+(com|org|edu)$'
ls = [u'first.last@example.com', u'noreply@example.com']

for txt in ls:
print 'Candidate:', txt
match = re.search(pattern, txt)
if match:
print u' Match:', match.group(0)
else:
print u' No match'

输出结果:

Candidate: first.last@example.com
Match: first.last@example.com
Candidate: noreply@example.com
No match

后向搜索肯定模式例子

# -*-coding:utf-8-*-

import re

pattern = re.compile('(?<=@)([\w\d_]+)', re.VERBOSE)
text = '''This text includes two Twitter handles.
One for @caimouse, and one for the author, @caijunsheng.
'''

print text
for match in pattern.findall(text):
print match

输出:

This text includes two Twitter handles.
One for @caimouse, and one for the author, @caijunsheng.

caimouse
caijunsheng

参考