关于正则表达式,gcc 4.8或更早的版本有 bug 吗?

我尝试在一段 C + + 11代码中使用 std: : regex,但是看起来支持有点问题。举个例子:

#include <regex>
#include <iostream>


int main (int argc, const char * argv[]) {
std::regex r("st|mt|tr");
std::cerr << "st|mt|tr" << " matches st? " << std::regex_match("st", r) << std::endl;
std::cerr << "st|mt|tr" << " matches mt? " << std::regex_match("mt", r) << std::endl;
std::cerr << "st|mt|tr" << " matches tr? " << std::regex_match("tr", r) << std::endl;
}

产出:

st|mt|tr matches st? 1
st|mt|tr matches mt? 1
st|mt|tr matches tr? 0

当使用 gcc (MacPorts gcc474.7.1 _ 2)4.7.1编译时,

g++ *.cc -o test -std=c++11
g++ *.cc -o test -std=c++0x

或者

g++ *.cc -o test -std=gnu++0x

此外,如果我只有两种可选模式,例如 st|mt,那么正则表达式工作得很好,所以由于某些原因,最后一种模式看起来不匹配。该代码与苹果 LLVM 编译器工作得很好。

对于如何解决这个问题有什么想法吗?

更新 一个可能的解决方案是使用组来实现多个替代方案,例如 (st|mt)|tr

37626 次浏览

<regex> was implemented and released in GCC 4.9.0.

In your (older) version of GCC, it is not implemented.

That prototype <regex> code was added when all of GCC's C++0x support was highly experimental, tracking early C++0x drafts and being made available for people to experiment with. That allowed people to find problems and give feedback to the standard committee before the standard was finalised. At the time lots of people were grateful to have had access to bleeding edge features long before C++11 was finished and before many other compilers provided any support, and that feedback really helped improve C++11. This was a Good ThingTM.

The <regex> code was never in a useful state, but was added as a work-in-progress like many other bits of code at the time. It was checked in and made available for others to collaborate on if they wanted to, with the intention that it would be finished eventually.

That's often how open source works: Release early, release often -- unfortunately in the case of <regex> we only got the early part right and not the often part that would have finished the implementation.

Most parts of the library were more complete and are now almost fully implemented, but <regex> hadn't been, so it stayed in the same unfinished state since it was added.

Seriously though, who though that shipping an implementation of regex_search that only does "return false" was a good idea?

It wasn't such a bad idea a few years ago, when C++0x was still a work in progress and we shipped lots of partial implementations. No-one thought it would remain unusable for so long so, with hindsight, maybe it should have been disabled and required a macro or built-time option to enable it. But that ship sailed long ago. There are exported symbols from the libstdc++.so library that depend on the regex code, so simply removing it (in, say, GCC 4.8) would not have been trivial.

Feature Detection

This is a snippet to detect if the libstdc++ implementation is implemented with C preprocessor defines:

#include <regex>
#if __cplusplus >= 201103L &&                             \
(!defined(__GLIBCXX__) || (__cplusplus >= 201402L) || \
(defined(_GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT) || \
defined(_GLIBCXX_REGEX_STATE_LIMIT)           || \
(defined(_GLIBCXX_RELEASE)                && \
_GLIBCXX_RELEASE > 4)))
#define HAVE_WORKING_REGEX 1
#else
#define HAVE_WORKING_REGEX 0
#endif

Macros

  • _GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT is defined in bits/regex.tcc in 4.9.x
  • _GLIBCXX_REGEX_STATE_LIMIT is defined in bits/regex_automatron.h in 5+
  • _GLIBCXX_RELEASE was added to 7+ as a result of this answer and is the GCC major version

Testing

You can test it with GCC like this:

cat << EOF | g++ --std=c++11 -x c++ - && ./a.out
#include <regex>


#if __cplusplus >= 201103L &&                             \
(!defined(__GLIBCXX__) || (__cplusplus >= 201402L) || \
(defined(_GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT) || \
defined(_GLIBCXX_REGEX_STATE_LIMIT)           || \
(defined(_GLIBCXX_RELEASE)                && \
_GLIBCXX_RELEASE > 4)))
#define HAVE_WORKING_REGEX 1
#else
#define HAVE_WORKING_REGEX 0
#endif


#include <iostream>


int main() {
const std::regex regex(".*");
const std::string string = "This should match!";
const auto result = std::regex_search(string, regex);
#if HAVE_WORKING_REGEX
std::cerr << "<regex> works, look: " << std::boolalpha << result << std::endl;
#else
std::cerr << "<regex> doesn't work, look: " << std::boolalpha << result << std::endl;
#endif
return result ? EXIT_SUCCESS : EXIT_FAILURE;
}
EOF

Results

Here are some results for various compilers:


$ gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


$ ./a.out
<regex> doesn't work, look: false

$ gcc --version
gcc (GCC) 6.2.1 20160830
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


$ ./a.out
<regex> works, look: true

$ gcc --version
gcc (Debian 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


$ ./a.out
<regex> works, look: true

$ gcc --version
gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


$ ./a.out
<regex> works, look: true

$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


$ ./a.out
<regex> works, look: true

$ gcc --version
gcc (GCC) 6.2.1 20160830
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


$ clang --version
clang version 3.9.0 (tags/RELEASE_390/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
$ ./a.out  # compiled with 'clang -lstdc++'
<regex> works, look: true

Here be Dragons

This is totally unsupported and relies on the detection of private macros that the GCC developers have put into the bits/regex* headers. They could change and go away at anytime. Hopefully, they won't be removed in the current 4.9.x, 5.x, 6.x releases but they could go away in the 7.x releases.

If the GCC developers added a #define _GLIBCXX_HAVE_WORKING_REGEX 1 (or something, hint hint nudge nudge) in the 7.x release that persisted, this snippet could be updated to include that and later GCC releases would work with the snippet above.

As far as I know, all other compilers have a working <regex> when __cplusplus >= 201103L but YMMV.

Obviously this would completely break if someone defined the _GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT or _GLIBCXX_REGEX_STATE_LIMIT macros outside of the stdc++-v3 headers.

At this moment (using std=c++14 in g++ (GCC) 4.9.2) is still not accepting regex_match.

Here is an approach that works like regex_match but using sregex_token_iterator instead. And it works with g++.

string line="1a2b3c";
std::regex re("(\\d)");
std::vector<std::string> inVector{
std::sregex_token_iterator(line.begin(), line.end(), re, 1), {}
};


//prints all matches
for(int i=0; i<inVector.size(); ++i)
std::cout << i << ":" << inVector[i] << endl;

it will print 1 2 3

you may read the sregex_token_iterator reference in: http://en.cppreference.com/w/cpp/regex/regex_token_iterator