replace使用正则表达式 java replaceall正则表达式

转载

mob64ca13fd163c 2023-11-14 13:51:24

文章标签 replace使用正则表达式 java System bc java 文章分类 Java 后端开发

我们知道String replaceAll(参数a, 参数b) 参数a是需要些正则表达式的. 但是今天试了试,发现参数b也有一些其它特性.

查看源码后,发现有些特性是平时不怎么用的.下面我来介绍一下这两个参数的特性.

参数a是正则表达式这个就没什么特色了.

参数b有点特殊.

参数b中对\ 和 $ 进行了特殊处理.

查看源码可以发现最后会调用下面的方法:

java.util.regex.Matcher的appendReplacement方法

replace使用正则表达式 java replaceall正则表达式_bc

下面对参数a和参数b一些特殊用法进行详细的介绍,并带有实例:

特性1:参数b里会对\进行特殊处理.

类似于正则的用法. \\\\四个反斜线,最后会表示为一个\反斜线.
String s = "a(bc)d_abcd";
System.out.println(s.replaceAll("_", "\\\\_"));
结果:
a(bc)d\_abcd

特性2:参数b中如果出现$后面跟着数字,相当于对前面的正则表达式的反向引用.(类似于正则中的\1这种效果)

分组号为0-9之间的数字.

String testg = "amfooniceshow";
//$2 相当于对前面正则表达式的第二组进行引用
System.out.println(testg.replaceAll("(am)(foo)", "$2haha"));
结果:
foohahaniceshow

特色3:参数a语法:(<?name>) 参数b语法:${name}源码中还对这种格式进行特殊处理.

刚开始我试了几种写法,发现会报错,如下.

Exception in thread "main" java.lang.IllegalArgumentException: No group with name {xxx}

at java.util.regex.Matcher.appendReplacement(Matcher.java:800)

at java.util.regex.Matcher.replaceAll(Matcher.java:906)

at java.lang.String.replaceAll(String.java:2162)

看了源码中的注释也没对这种用法做特殊说明.在网上搜了一下发现国内外所有的网站都没有说明${name}用法

而我却偏偏想知道这是怎么用的,我接车自己查看源码,发现在参数a中需要写特殊语法才能操作.

在java.util.regex.Pattern中的group0()方法中有这段操作,会对namedGroups进行操作.

replace使用正则表达式 java replaceall正则表达式_replace使用正则表达式 java_02

java.util.regex.Matcher的appendReplacement方法,会对namedGroups进行判断.

如果参数a没有按照<?name>格式写参数的话,后面的判断就会报错.

replace使用正则表达式 java replaceall正则表达式_bc_03

参数a中的这种写法(<?name>)这种格式基本没人用,国内外网上一片文章都没有.

java的正则文档对这种写法也是只字未提.

经过看源码和测试发现

参数a中:(<?name>) 然后在参数b中${name}对前面的进行引用.

(<?name>)这种相当于零宽度匹配(和非捕获组区别在于,这个是有分组的),name必须是大小写字母,不能是其它的,是其它的就会编译不通过.

参数b是${name} 是对前面的进行引用.可以引用也可以不引用(效果一样,下面有示例验证).

简单示例:

String test1 = "hahaamfooniceshowerqwhdfgsd";
System.out.println(test1.replaceAll("(?<sho>)wer", "${sho}123456"));
结果:
hahaamfoonicesho123456qwhdfgsd

验证它存在分组的示例:

//结果是1, 相当于有一个组,这种是()被认为是一个分组.
Pattern p = Pattern.compile("(?<xxx>)");
System.out.println(p.matcher("ab").groupCount());
//常规的结果是0, 相当于没有新建分组
Pattern pp = Pattern.compile("xxx");
System.out.println(pp.matcher("ab").groupCount());
//常规的非捕获组结果也是0, 相当于没有新建分组
Pattern ppp = Pattern.compile("(?<=xxx)");
System.out.println(ppp.matcher("ab").groupCount());

下面两段

package com.xjl456852.manager;

import java.util.regex.Matcher;

/**
 * Created by xjl456852 on 2017/3/9.
 */
public class StringTest {
    public static void main(String args[]) {
        String s = "a(bc)d_abcd";
        System.out.println(s.replaceAll("_", "\\\\_"));
        //会出错
//        System.out.println(s.replaceAll("_", "\\\\$"));
        System.out.println(s.replaceAll("_", "\\\\\\$"));
        //可用Matcher.quoteReplacement() 对replaceAll的第二个参数进行转义
        System.out.println(s.replaceAll("_", Matcher.quoteReplacement("\\$")));
        System.out.println(s.replaceAll("_", "\\$"));
        System.out.println(s.replaceAll("_", "\\."));
        System.out.println(s.replaceAll("_", "."));
        System.out.println(s.replaceAll("_", "\\\\%"));
        System.out.println(s.replaceAll("_", "\\5"));
        System.out.println(s.replaceAll("_", "5"));
        System.out.println(s.replaceAll("_", "5"));
        System.out.println(s.replaceAll("_", "\""));
        System.out.println(s.replaceAll("_", "\\\""));
        System.out.println(s.replaceAll("_", "\\${1}"));
        //会出错 ${} 这个大括号里面不能是数字
//        System.out.println(s.replaceAll("_", "${1}"));

        String testg = "amfooniceshow";
        //$2 相当于对前面正则表达式的第二组进行引用
        System.out.println(testg.replaceAll("(am)(foo)", "$2haha"));

        System.out.println("--------------");
        String test = "hahaam${foo}niceshow";
        System.out.println(test.replaceAll("\\$\\{.*\\}", "\\\\\\$\\{ss\\}"));
        
        System.out.println(test.replaceAll("\\$\\{.*\\}", Matcher.quoteReplacement("\\${ss}")));
        //会出现错误
//        System.out.println(test1.replaceAll("\\$\\{.*\\}", "${ss}"));

    }
}

结果:

a(bc)d\_abcd
a(bc)d\$abcd
a(bc)d\$abcd
a(bc)d$abcd
a(bc)d.abcd
a(bc)d.abcd
a(bc)d\%abcd
a(bc)d5abcd
a(bc)d5abcd
a(bc)d5abcd
a(bc)d"abcd
a(bc)d"abcd
a(bc)d${1}abcd
foohahaniceshow
--------------
hahaam\${ss}niceshow
hahaam\${ss}niceshow

另一段程序:

这段程序主要是对${name} 这种特殊用法做了介绍.

package com.xjl456852.manager;

import java.util.regex.Pattern;

/**
 * Created by xjl456852 on 2017/3/9.
 */
public class StringTest {
    public static void main(String args[]) {
        
        System.out.println("下面是${name}格式的replaceAll替换");
        String test1 = "hahaamfooniceshowerqwhdfgsd";
        System.out.println("--------------------");
        //${name} 属于有名字的领宽度匹配的引用.name必须是大写或小写字母,不能是其它符号或者数字.这种用法真是少见,感觉没什么实际的意义.
        //${name} 中的name,是一种分组的名字,也就是说在前面的正则中需要写入这样的名字.而且这个名字的写法比较特殊.
        //我查看源码之后了解到这种写法必须是<?name>的格式.
        //例如下面.它的意思是:sho为零宽度匹配,这个可以匹配wer
        System.out.println(test1.replaceAll("(?<sho>)wer", "${sho}123456"));
        //下面这两个虽然按理说不能匹配wer,因为<sh>中少了个o,但其实可以匹配wer(这种用法真是不知道在那种场景中才能用到),所以将wer替换为123456
        System.out.println(test1.replaceAll("(?<sh>)wer", "123456"));
        System.out.println(test1.replaceAll("(?<sh>)wer", "${sh}123456"));
        //这个参数b中使用了$1,而(?<sh>)会被认为是一个分组,但是这个又是零宽度的,所以$1相当于没有引用.而$1已经被使用,所以1不会打印出来,只会打印23456
        System.out.println(test1.replaceAll("(?<sh>)wer", "$123456"));
        System.out.println("----------------------");
        //下面这两个写的<xxx>这种字符是不存在的,但是后面的we也会被替换
        System.out.println(test1.replaceAll("(?<xxx>)we", "123${xxx}456"));
        //下面这个和上面那个一样,<xxx>这个是不存在的,但是ho和we都会被替换掉
        System.out.println(test1.replaceAll("ho(?<xxx>)we", "123${xxx}456"));
        //这个h后面少了o,后面紧接着是we,故无法匹配字符串,所以会以原字符串输出
        System.out.println(test1.replaceAll("h(?<xxx>)we", "123${xxx}456"));
        System.out.println("------------------");
        //这个也匹配不到,所以无法替换,所以会以原字符串输出
        System.out.println(test1.replaceAll("(?<sho>)dwer", "123456"));
        //下面这两个都是领宽度匹配,前后都没条件,无论<>中写的是什么字符串,前后都没有条件,就不会匹配任何字符串,按领宽度匹配处理,所以会在每个字符串的前后都插入123.
        System.out.println(test1.replaceAll("(?<sho>)", "123"));
        System.out.println(test1.replaceAll("(?<xxx>)", "123"));
        System.out.println("=============================");
        //这个是用非捕获组进行的匹配.会在sh后面插入123
        System.out.println(test1.replaceAll("(?<=sh)", "123"));
        //其实用非捕获字也能实现这种每个字符的前后都插入相同目标字符串的功能.但是这个有一个特点就是原字符串中必须不包含xxx
        System.out.println(test1.replaceAll("(?<!xxx)", "123"));
        //比如下面的这个,原字符串中就包含sh,使用?<!sh ,意思是不是sh的字符串都插入123.这时结果跟上一个字符串相比就少了一组123.
        // 因为检查到字符s时,发现不是sh,所以会插入123,再次检查到h时,发现是sh,所以不插入,这时会跳过.h后面就会少一组123
        System.out.println(test1.replaceAll("(?<!sh)", "123"));
        System.out.println("-------------");
        
        //下面是一些其它的写法可以根据结果自己理解.
        System.out.println(test1.replaceAll("(?<ha>)haam.*?(ce)(?<sh>).*?(fg)", "12${ha}34${sh}56"));
        System.out.println(test1.replaceAll("(?<ha>)haam", "123456${ha}"));
        System.out.println(test1.replaceAll("(ce)(?<sh>)", "1234${sh}56"));
        System.out.println(test1.replaceAll("ce(?<sh>)", "123456"));
        System.out.println("-------------");
        System.out.println(test1.replaceAll("(?<sh>)ow", "1234${sh}56"));
        System.out.println(test1.replaceAll("(?<sh>)ow", "123456"));
        System.out.println("-------------");
        System.out.println(test1.replaceAll("(ce)(?<sh>)(ow)", "1234${sh}56"));
        System.out.println(test1.replaceAll("ce(?<sh>)ow", "1234${sh}56"));
        System.out.println(test1.replaceAll("(?<sh>)", "1234${sh}56"));
        System.out.println(test1.replaceAll("(?<haam>)", "123${haam}456"));
        System.out.println("---------------------");
        System.out.println(test1.replaceAll("(?<xxx>)", "123${xxx}456"));
        System.out.println(test1.replaceAll("(?<xxx>)", "123456"));

        //结果是1, 相当于有一个组,这种是()被认为是一个分组.
        Pattern p = Pattern.compile("(?<xxx>)");
        System.out.println(p.matcher("ab").groupCount());
        //常规的结果是0, 相当于没有新建分组
        Pattern pp = Pattern.compile("xxx");
        System.out.println(pp.matcher("ab").groupCount());
        //常规的非捕获组结果也是0, 相当于没有新建分组
        Pattern ppp = Pattern.compile("(?<=xxx)");
        System.out.println(ppp.matcher("ab").groupCount());
        
       
    }
}

下面是${name}格式的replaceAll替换
--------------------
hahaamfoonicesho123456qwhdfgsd
hahaamfoonicesho123456qwhdfgsd
hahaamfoonicesho123456qwhdfgsd
hahaamfoonicesho23456qwhdfgsd
----------------------
hahaamfoonicesho123456rqwhdfgsd
hahaamfoonices123456rqwhdfgsd
hahaamfooniceshowerqwhdfgsd
------------------
hahaamfooniceshowerqwhdfgsd
123h123a123h123a123a123m123f123o123o123n123i123c123e123s123h123o123w123e123r123q123w123h123d123f123g123s123d123
123h123a123h123a123a123m123f123o123o123n123i123c123e123s123h123o123w123e123r123q123w123h123d123f123g123s123d123
=============================
hahaamfoonicesh123owerqwhdfgsd
123h123a123h123a123a123m123f123o123o123n123i123c123e123s123h123o123w123e123r123q123w123h123d123f123g123s123d123
123h123a123h123a123a123m123f123o123o123n123i123c123e123s123ho123w123e123r123q123w123h123d123f123g123s123d123
-------------
ha123456sd
ha123456fooniceshowerqwhdfgsd
hahaamfooni123456showerqwhdfgsd
hahaamfooni123456showerqwhdfgsd
-------------
hahaamfoonicesh123456erqwhdfgsd
hahaamfoonicesh123456erqwhdfgsd
-------------
hahaamfooniceshowerqwhdfgsd
hahaamfooniceshowerqwhdfgsd
123456h123456a123456h123456a123456a123456m123456f123456o123456o123456n123456i123456c123456e123456s123456h123456o123456w123456e123456r123456q123456w123456h123456d123456f123456g123456s123456d123456
123456h123456a123456h123456a123456a123456m123456f123456o123456o123456n123456i123456c123456e123456s123456h123456o123456w123456e123456r123456q123456w123456h123456d123456f123456g123456s123456d123456
---------------------
123456h123456a123456h123456a123456a123456m123456f123456o123456o123456n123456i123456c123456e123456s123456h123456o123456w123456e123456r123456q123456w123456h123456d123456f123456g123456s123456d123456
123456h123456a123456h123456a123456a123456m123456f123456o123456o123456n123456i123456c123456e123456s123456h123456o123456w123456e123456r123456q123456w123456h123456d123456f123456g123456s123456d123456
1
0
0

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。