正则表达式: 如何访问一个组的多个匹配项?

我正在组合一个相当复杂的正则表达式。表达式的一部分匹配字符串,如“ + a”、“-57”等。A + 或 a-后跟任意数量的字母或数字。我希望匹配0个或更多符合此模式的字符串。

这是我想出来的表达方式:

([\+-][a-zA-Z0-9]+)*

如果我使用这个模式搜索字符串“-56 + a”,我希望得到两个匹配项:

+ a 和 -56

但是,我只能得到最后一个匹配结果:

>>> m = re.match("([\+-][a-zA-Z0-9]+)*", '-56+a')
>>> m.groups()
('+a',)

看看 Python 的文档,我发现:

如果一个组匹配多次,则只能访问最后一个匹配:

>>> m = re.match(r"(..)+", "a1b2c3")  # Matches 3 times.
>>> m.group(1)                        # Returns only the last match.
'c3'

因此,我的问题是: 如何访问多个组匹配?

119297 次浏览

Drop the * from your regex (so it matches exactly one instance of your pattern). Then use either re.findall(...) or re.finditer (see here) to return all matches.

Update:

It sounds like you're essentially building a recursive descent parser. For relatively simple parsing tasks, it is quite common and entirely reasonable to do that by hand. If you're interested in a library solution (in case your parsing task may become more complicated later on, for example), have a look at pyparsing.

The regex module fixes this, by adding a .captures method:

>>> m = regex.match(r"(..)+", "a1b2c3")
>>> m.captures(1)
['a1', 'b2', 'c3']