C + + : 我应该使用什么正则表达式库?

我正在从事一个商业(非开源) C + + 项目,该项目运行在基于 Linux 的系统上。我需要在 C + + 代码中做一些正则表达式。(我知道: 我现在有两个问题。)

问: 经常使用 C/C + + 正则表达式的人建议我查看哪些库?快速搜索引起了我的注意:

1) Boost. Regex (我需要去阅读 Boost 软件许可证,但这个问题与软件许可证无关)

2) C (不是 C + +) POSIX regex (# include < regex.h > ,regcomp,regexec 等)

3) http://freshmeat.net/projects/cpp_regex/(我对此一无所知; 似乎是 GPL,因此不适用于此项目)

112930 次浏览

In C++ projects past, I have used PCRE with good success. It's very complete and well-tested since it's used in many high profile projects. And I see that Google has contributed a set of C++ wrappers for PCRE recently, too.

C++ has a builtin regex library since TR1. AFAIK Boost's regex library is very compatible with it and can be used as a replacement, if your standard library doesn't provide TR1.

Boost has regex in it.

That should fill the bill

I've personally always used boost.regex (although I don't have much need for regex in C++). Microsoft Labs has a regex library too, called GRETA: http://research.microsoft.com/projects/greta/. Apparently it's very fast and features a whole Perl 5 syntax. I haven't used it, but you may want to test it out.

Boost.Regex is very good and is slated to become part of the C++0x standard (it's already in TR1).

Personally, I find Boost.Xpressive much nicer to work with. It is a header-only library and it has some nice features such as static regexes (regexes compiled at compile time).

Update: If you're using a C++11 compliant compiler (gcc 4.8 is NOT!), use std::regex unless you have good reason to use something else.

Thanks for all the suggestions.

I tried out a few things today, and with the stuff we're trying to do, I opted for the simplest solution where I don't have to download any other 3rd-party library. In the end, I #include <regex.h> and used the standard C POSIX calls regcomp() and regexec(). Not C++, but in a pinch this proved to be the easiest.

I faced a similar situation and ended up using Henry Spencers Regexp Engine http://www.codeproject.com/KB/string/spencerregexp.aspx

You can also look at fast regex library that was developed at Yandex search engine for doing fast matches of thousands of patterns against huge amounts of data.

Noone here said anything about the one that comes with C++0x. If you are using a compiler and the STL that supports C++0x you could just use that instead of having another lib in your project.

Two more options:

If you can write it in c++11 - Do the tutorial: http://www.codeguru.com/cpp/cpp/cpp_mfc/stl/article.php/c15339

Note: At the time of writing the only c++11 regex library that I know works is the clang/llvm one, and only works on Mac. The GNU still doesn't implement regex yet. I don't know about Visual Studio. Most people still use the boost regex implementation.


Or you can use ragel to generate a finite state machine to do the parsing for you, and generate the C/C++ code implementation: http://www.complang.org/ragel/

I used it a little to generate code to parse json. This ragel file: https://github.com/matiu2/yajp/blob/master/parser/number.rl is used to generate this code https://github.com/matiu2/yajp/blob/master/parser/json.hpp#L254 and this finite state machine diagram:

state diagram


Update 1:

lvm's libc++ regex works on ubuntu 14.04: libc++-dev - LLVM C++ Standard library (development files). When compiling: clang++ -std=c++11 -lc++ -I/usr/include/c++/v1 ...

Update 2:

I'm currently enjoying boost spirit 3 - I like it more than regex, because it has BNF style rules and is well thought out. (Older (more documented) Spirit Qi libs found here)