python正则连接txt 正则匹配 python

转载

mob64ca140d96d9 2024-06-04 14:47:55

文章标签 python正则连接txt python 正则匹配png 正则表达式字符串元字符 文章分类 Python 后端开发

2.1 用正则表达式查找文本模式

1)正则表达式，简称为regex，是文本模式的描述方法。

2)Python中所有正则表达式的函数都在re模块中。

3)使用正则表达式的步骤：

第一步：用import re导入正则表达式模块；

第二步：用re.compile()函数创建一个Regex对象(使用原始字符串)；

第三步：向Regex对象的search()方法传入想查找的字符串，它返回一个Match对象；

第四步：调用Match对象的group()方法，返回实际匹配文本的字符串。

例如：

import rephoneNumberRegex = re.compile(r'\d{3}-\d{3}-\d{4}')mo = phoneNumberRegex.search('My number is 123-456-7890')print('Phone number found ' + mo.group())#Phone number found 123-456-7890

2.2 用正则表达式匹配更多模式

1)利用括号分组：

import rephoneNumberRegex = re.compile(r'(\d{3})-(\d{3}-\d{4})')mo = phoneNumberRegex.search('My number is 123-456-7890')print(mo.group())#123-456-7890print(mo.group(0))#123-456-7890print(mo.group(1))#123print(mo.group(2))#456-7890print(mo.groups())#('123', '456-7890')

正则表达式字符串中的第一对括号是第1组，第二对括号是第2组，向group()匹配对象方法传入1或2，就可以取得匹配文本的不同部分。向group()方法传入0或不传入参数，将返回整个匹配的文本。

如果想要一次就获取所有的分组，可以使用groups()方法，返回多个值的元组。

2)用管道“|”匹配多个分组：

import remyRegex = re.compile(r'cxh | huahua')mo = myRegex.search('cxh and huahua')print(mo.group())#cxh

3)用问号匹配零次或一次前面的分组：

import remyRegex = re.compile(r'cxh(and)?huahua')mo = myRegex.search('cxhhuahua')print(mo.group())#cxhhuahua

4)用星号匹配零次或多次前面的分组：

import remyRegex = re.compile(r'cxh(and)*huahua')mo = myRegex.search('cxhandandhuahua')print(mo.group())#cxhandandhuahua

5)用加号匹配一次或多次前面的分组：

import remyRegex = re.compile(r'cxh(and)+huahua')mo = myRegex.search('cxhandandandhuahua')print(mo.group())#cxhandandandhuahua

6)用花括号匹配特定次数：

{n}匹配n次前面的分组；

{n,}匹配n次或更多前面的分组；

{,m}匹配零次到m次前面的分组；

{n,m}匹配至少n次，最多m次前面的分组。

2.3 贪心和非贪心匹配

Rython的正则表达式默认是“贪心”的，在有二义情况下，它们会尽可能匹配最长的字符串。

花括号的“非贪心”版本匹配尽可能最短的字符串，即在结束的花括号后跟着一个问号：

import remyRegex = re.compile(r'(ha){3,5}?')mo = myRegex.search('hahahahahaha')print(mo.group())#hahaha

2.4 findall()方法

search()将返回一个Match对象，包含被查找字符串中的“第一次”匹配的文本，而findall()方法将返回一组字符串，包含被查找字符串中的所有匹配。

有分组：

import remyRegex = re.compile(r'(\d{3})-(\d{3})-(\d{4})')mo = myRegex.findall('Cell 123-456-7890 Work: 098-765-4321')print(mo)#[('123', '456', '7890'), ('098', '765', '4321')]

没有分组：

import remyRegex = re.compile(r'\d{3}-\d{3}-\d{4}')mo = myRegex.findall('Cell 123-456-7890 Work: 098-765-4321')print(mo)#['123-456-7890', '098-765-4321']

2.5 字符分类

2.5.1 缩写字符分类

python正则连接txt 正则匹配 python_元字符

2.5.2 用方括号自定义字符分类

1)[0-5]匹配数字0到5；

2)[a-zA-Z0-9]匹配所有小写字母、大写字母和数字；

3)在方括号内，普通的正则表达式符号不会被解释；

4)通过在字符分类的左方括号后加上一个插入字符(^)，就可以得到“非字符类”。

2.6 插入字符和美元字符

1)可以在正则表达式的开始处使用插入符号(^)，表明匹配必须发生在被查找文本开始处。

2)可以在正则表达式的末尾加上美元符号($)，表示该字符串必须以这个正则表达式的模式结束。

例如：r'^\d+$'匹配从开始到结束都是数字的字符串。

2.7 通配字符

1)句点(.)字符称为“通配符”，匹配除了换行之外的所有字符：

import remyRegex = re.compile(r'.at')mo = myRegex.findall('The cat in the hat sat on the flat mat.')print(mo)#['cat', 'hat', 'sat', 'lat', 'mat']

2)用点-星匹配所有字符：

import remyRegex = re.compile(r'First Name: (.*) Last Name: (.*)')mo = myRegex.search('First Name: cxh Last Name: huahua')print(mo.group())#First Name: cxh Last Name: huahuaprint(mo.group(1))#cxhprint(mo.group(2))#huahua

2.8 不区分大小写的匹配

向re.compile()传入re.IGNORECASE或re.I，作为第二个参数：

import remyRegex = re.compile(r'cxh', re.I)mo = myRegex.search('CXH is a girl')print(mo.group())#CXH

2.9 用sub()方法替换字符串

1)传入两个参数，第一个参数是一个字符串，用于取代发现的匹配，第二个参数是一个字符串，即正则表达式：

import remyRegex = re.compile(r'Agent \w+')mo = myRegex.sub('cxh', 'Agent Alice gave the secret documents to Agent Bob.')print(mo)#cxh gave the secret documents to cxh.

2)使用匹配的文本本身，作为替换的一部分，在sub()的第一个参数中，可以输入\1、\2、\3......，表示“在替换中输入分组1、2、3......的文本”：

import remyRegex = re.compile(r'Agent (\w)\w*')mo = myRegex.sub(r'\1****', 'Agent Alice gave the secret documents to Agent Bob.')print(mo)#A**** gave the secret documents to B****.

2.10 管理复杂的正则表达式

忽略正则表达式字符串中的空白符和注释，可以向re.compile()传入变量re.VERBOSE，作为第二个参数。

2.11 组合使用re.IGNORECASE、re.DOTALL和re.VERBOSE

例如：someRegexValue = re.compile('foo', re.IGNORECASE | re.DOTALL | re.VERBOSE)

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。