1, 匹配所有标签
regex:
\<.[^<>]*\>
source:
<external_network_location_id>20130401_TXNONC100FFS3101TAUSNPN1733590048828A_0048828</external_network_location_id>
result:
<external_network_location_id>
</external_network_location_id>
2, 匹配指定标签 eg:匹配指定的div标签
regex:
\<\bdiv.*\<\/div\b\>
source:
<div>23dd</div>
<div1>23dd</div1>
<div>23dd33ff</div>
result:
<div>23dd</div>
<div>23dd33ff</div>
3, 匹配某种特定格式的字符串
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
regex:
>.[^<>]+<
source:
<li><a href="http://([^\s]+)".+?span.+?\[(.+?)\].+?>(.+?)<
source:
<li><a href="http://www.wea.com/blog/a.html" title="怎样在百度空间添加友情链接"><span class="article-date">[2014/11/13]</span>怎样在百链接</a></li>
<li><a href="http://www.a.com/blog/b.html2" title="怎样在百度空间添加友情链接2"><span class="article-date">[2014/11/12]</span>怎样在百度链接2</a></li>
result:
http://www.wea.com/blog/a.html 2014/11/13 怎样在百链接
http://www.a.com/blog/b.html2 2014/11/12 怎样在百度链接2
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
regex:
<external_network_location_id>(.*?)</external_network_location_id>
source:
<external_network_location_id>20130401_TXNONC100FFS3101TAUSNPN1733590048828A_0048828</external_network_location_id>
<external_network_location_id>abcd1234004488877</external_network_location_id>
result:
20130401_TXNONC100FFS3101TAUSNPN1733590048828A_0048828
abcd1234004488877
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
regex:
<requserid>([^<]+)</requserid>
source:
<Request><Action>getuser</Action><UserLogin></UserLogin><Password></Password><Signature></Signature><VerifyText></VerifyText><requserid>535</requserid><requserid>5335</requserid></Request>
result:
535
5335
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
5,提取所有标签中的内容
regex 1:
<.+?>(.+?)<.+?>
regex 2:
(?is)(?<=>)[^<>]+(?=<)
source:
<span style=''>内容1</span><img src=".."/>内容2<p><input .../>内容3</p><p>内容4</p><b>内容5</b><i>内容6</i>
result:
内容1
内容2
内容3
内容4
内容5
内容6
6, 提取所有 img标签中的属性值 (其它标签可以借鉴)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
regex:
(?is)<img\s*((?<key>[^=]+)="(?<value>[^"]+)")+?\s*/?>
source:
<img src="acbdd"/><img src="33ff"/><img src="gggggeeee"/><a>33333</a>
result:
key=src value=acbdd
key=src value=33ff
key=src value=gggggeeee
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
regex 1:
(?is)<img\s*((?<key>[^=]+)=(["'])(?<value>[^'"]+)\2)+?\s*/?>([^<>]*?</img>)?
regex 2:
(?is)<img\s+((?<key>[^=]+)=(["']?)(?<value>[^'"]+)\2\s*)+?\s*/?>([^<>]*?</img>)?
source:
<img src="acbdd"/><img src="33ff"/><img src="gggggeeee"/><img src="bb"></img><a>33333</a>
result:
key=src value=acbdd
key=src value=33ff
key=src value=gggggeeee
key=src value=bb
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
(?<=^<external_provider_group_id>).*(?=</external_provider_group_id>)
可取第一个 23dd0078243d4323
贪婪 匹配
<external_provider_group_id>23dd0078243d4323</external_provider_group_id>
可取第一个 23dd0078243d4323
贪婪 匹配
(?<=^<external_provider_group_id>).*0078243.*(?=</external_provider_group_id>)
<[A-Za-z_-]+>\w+0078243\w+</[A-Za-z_-]+>
可取以下4项
\w+ 表示 取1个或多个
<external_provider_group_id>23dd0078243d4323</external_provider_group_id>
<a>dd0078243dsd</a>
<b_b>dd0078243dsd33</b_b>
<c-c>dd0078243dsd44</c-c>
<[A-Za-z_-]+>\w{0,}0078243\w{0,}</[A-Za-z_-]+>
可取以下6项
\w{0,} 表示取0个或多个
<external_provider_group_id>23dd0078243d4323</external_provider_group_id>
<external_provider_group_id>442232323</external_provider_group_id>
<external_provider_group_id>23dd0078243d432344</external_provider_group_id>
<a>dd0078243dsd</a>
<b_b>dd0078243dsd33</b_b>
<c-c>dd0078243dsd44</c-c>
<d-d>0078243</d-d>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
其它:
(?<=^<[A-Za-z_-]+>).*(?=</[A-Za-z_-]+>)
只能取第一个
.*(?<=<\w+>.*</\w+>)*
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
(the end)