C++ regex 正则表达式的使用                 

在c++中,有三种正则可以选择使用,C ++regex,C regex,boost regex ,如果在windows下开发c++,默认不支持后面两种正则,如果想快速应用,显然C++ regex 比较方便使用。文章将讨论C++ regex 正则表达式的使用。

C++ regex函数有3个:regex_match、 regex_search 、regex_replace



[cpp] view plain copy 在CODE上查看代码片派生到我的代码片

#include <iostream>                                                             
#include <regex>
#include <string>
int main(void){
        std::cout << "string literal matched\n";
    std::string s("subject");
    std::regex e("(sub)(.*)");
        std::cout << "string literal matched\n";
    std::cmatch cm;
    std::cout << "string literal with" << cm.size() << "matches\n";
    std::smatch sm;
    std::cout << "string object with" << sm.size() << " matcheds\n";

    std::cout << "range with" << sm.size() << " matched\n";
    std::cout << "the matches were:";
    for(unsigned i = 0;i<sm.size();++i){
        std::cout << "[" << sm.str() << "]";
    std::cout << '\n';
    for(unsigned i = 0;i<sm.size();++i){
        std::cout << "[" << sm[i] << "]";


[plain] view plain copy 在CODE上查看代码片派生到我的代码片

  1. string literal matched
  2. string literal matched

  3. string literal with3matches

  4. string object with3 matcheds

  5. range with3 matched

  6. the matches were:[subject][subject][subject]

  7. [subject][sub][ject]



[cpp] view plain copy 在CODE上查看代码片派生到我的代码片

// regex_search example  
#include <iostream>  
#include <regex>  
#include <string>  
int main(){  
  std::string s ("this subject has a submarine as a subsequence");  
  std::smatch m;  
  std::regex e ("\\b(sub)([^ ]*)");   // matches words beginning by "sub"  
  std::cout << "Target sequence: " << s << std::endl;  
  std::cout << "Regular expression: /\\b(sub)([^ ]*)/" << std::endl;  
  std::cout << "The following matches and submatches were found:" << std::endl;  
  while (std::regex_search (s,m,e)) {  
    for (auto x=m.begin();x!=m.end();x++)   
      std::cout << x->str() << " ";  
    std::cout << "--> ([^ ]*) match " << m.format("$2") <<std::endl;  
    s = m.suffix().str();  


[plain] view plain copy 在CODE上查看代码片派生到我的代码片

  1. Target sequence: this subject has a submarine as a subsequence  

  2. Regular expression: /\b(sub)([^ ]*)/  

  3. The following matches and submatches were found:  

  4. subject sub ject --> ([^ ]*) match ject  

  5. submarine sub marine --> ([^ ]*) match marine  

  6. subsequence sub sequence --> ([^ ]*) match sequence  

[cpp] view plain copy 在CODE上查看代码片派生到我的代码片



[cpp] view plain copy 在CODE上查看代码片派生到我的代码片

#include <regex>   
#include <iostream>   
int main() {   
    char buf[20];   
    const char *first = "axayaz";   
    const char *last = first + strlen(first);   
    std::regex rx("a");   
    std::string fmt("A");   
    std::regex_constants::match_flag_type fonly =   
    *std::regex_replace(&buf[0], first, last, rx, fmt) = '\0';   
    std::cout << &buf[0] << std::endl;   
    *std::regex_replace(&buf[0], first, last, rx, fmt, fonly) = '\0';   
    std::cout << &buf[0] << std::endl;   
    std::string str("adaeaf");   
    std::cout << std::regex_replace(str, rx, fmt) << std::endl;   
    std::cout << std::regex_replace(str, rx, fmt, fonly) << std::endl;   
    return 0;   


[plain] view plain copy 在CODE上查看代码片派生到我的代码片

  1. AxAyAz  

  2. Axayaz  

  3. AdAeAf  

  4. Adaeaf  

C++ regex正则表达式的规则和其他编程语言差不多,如下:


.not newlineany character except line terminators (LF, CR, LS, PS).
\ttab (HT)a horizontal tab character (same as \u0009).
\nnewline (LF)a newline (line feed) character (same as \u000A).
\vvertical tab (VT)a vertical tab character (same as \u000B).
\fform feed (FF)a form feed character (same as \u000C).
\rcarriage return (CR)a carriage return character (same as \u000D).
\clettercontrol codea control code character whose code unit value is the same as the remainder of dividing the code unit value of letter by 32.
For example: \ca is the same as \u0001\cb the same as \u0002, and so on...
\xhhASCII charactera character whose code unit value has an hex value equivalent to the two hex digits hh.
For example: \x4c is the same as L, or \x23 the same as #.
\uhhhhunicode charactera character whose code unit value has an hex value equivalent to the four hex digitshhhh.
\0nulla null character (same as \u0000).
\intbackreferencethe result of the submatch whose opening parenthesis is the int-th (int shall begin by a digit other than 0). See groups below for more info.
\ddigita decimal digit character 
\Dnot digitany character that is not a decimal digit character
\swhitespacea whitespace character 
\Snot whitespaceany character that is not a whitespace character
\wwordan alphanumeric or underscore character 
\Wnot wordany character that is not an alphanumeric or underscore character
\charactercharacterthe character character as it is, without interpreting its special meaning within a regex expression.
Any character can be escaped except those which form any of the special character sequences above.
Needed for: ^ $ \ . * + ? ( ) [ ] { } |
[class]character classthe target character is part of the class 
[^class]negated character classthe target character is not part of the class 


[cpp] view plain copy 在CODE上查看代码片派生到我的代码片

  1. std::regex e1 ("\\d");  //  \d -> 匹配数字字符  

  2. std::regex e2 ("\\\\"); //  \\ -> 匹配反斜杠字符  


*0 or moreThe preceding atom is matched 0 or more times.
+1 or moreThe preceding atom is matched 1 or more times.
?0 or 1The preceding atom is optional (matched either 0 times or once).
{int}intThe preceding atom is matched exactly int times.
{int,}int or moreThe preceding atom is matched int or more times.
{min,max}between min and maxThe preceding atom is matched at least min times, but not more than max.

注意了,模式 "(a+).*" 匹配 "aardvark" 将匹配到 aa,模式 "(a+?).*" 匹配 "aardvark" 将匹配到 a


(subpattern)GroupCreates a backreference.
(?:subpattern)Passive groupDoes not create a backreference.


charactersdescriptioncondition for match
^Beginning of lineEither it is the beginning of the target sequence, or follows a line terminator.
$End of lineEither it is the end of the target sequence, or precedes a line terminator.
|SeparatorSeparates two alternative patterns or subpatterns..


[abc] 匹配 a, b 或 c.
[^xyz] 匹配任何非 x, y, z的字符

[a-z] 匹配任何小写字母 (a, b, c, ..., z).
[abc1-5] 匹配 a, b , c, 或 1 到 5 的数字.

c++ regex还有一种类POSIX的写法

classdescriptionequivalent (with regex_traits, default locale)
[:alnum:]alpha-numerical characterisalnum
[:alpha:]alphabetic characterisalpha
[:blank:]blank characterisblank
[:cntrl:]control characteriscntrl
[:digit:]decimal digit characterisdigit
[:graph:]character with graphical representationisgraph
[:lower:]lowercase letterislower
[:print:]printable characterisprint
[:punct:]punctuation mark characterispunct
[:space:]whitespace characterisspace
[:upper:]uppercase letterisupper
[:xdigit:]hexadecimal digit characterisxdigit
[:d:]decimal digit characterisdigit
[:w:]word characterisalnum
[:s:]whitespace characterisspace