正则表达式: 如何访问一个组的多个匹配项？

我正在组合一个相当复杂的正则表达式。表达式的一部分匹配字符串，如“ + a”、“-57”等。A + 或 a-后跟任意数量的字母或数字。我希望匹配0个或更多符合此模式的字符串。

这是我想出来的表达方式:

([\+-][a-zA-Z0-9]+)*

如果我使用这个模式搜索字符串“-56 + a”，我希望得到两个匹配项:

+ a 和 -56

但是，我只能得到最后一个匹配结果:

>>> m = re.match("([\+-][a-zA-Z0-9]+)*", '-56+a')
>>> m.groups()
('+a',)

看看 Python 的文档，我发现:

如果一个组匹配多次，则只能访问最后一个匹配:
>>> m = re.match(r"(..)+", "a1b2c3")  # Matches 3 times.
>>> m.group(1)                        # Returns only the last match.
'c3'

因此，我的问题是: 做如何访问多个组匹配？

119297

小开

最佳答案

Drop the * from your regex (so it matches exactly one instance of your pattern). Then use either re.findall(...) or re.finditer (see here) to return all matches.

Update:

It sounds like you're essentially building a recursive descent parser. For relatively simple parsing tasks, it is quite common and entirely reasonable to do that by hand. If you're interested in a library solution (in case your parsing task may become more complicated later on, for example), have a look at pyparsing.