找到所有正则表达式匹配的索引?

我解析的字符串可能包含任意数量的引号字符串(我解析代码,并试图避免 PLY)。我想知道子字符串是否被引用,并且我有子字符串索引。我最初的想法是使用 re 查找所有匹配项,然后计算出它们所代表的索引的范围。

看起来我应该使用 regex,比如 \"[^\"]+\"|'[^']+'(目前我正在避免处理三重引号和类似的字符串)。当我使用 findall ()时,我会得到一个匹配字符串的列表,这有点不错,但是我需要索引。

我的子字符串可能和 c一样简单,我需要弄清楚这个特定的 c是否实际上被引用了。

122963 次浏览

This is what you want: (source)

re.finditer(pattern, string[, flags])

Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result unless they touch the beginning of another match.

You can then get the start and end positions from the MatchObjects.

e.g.

[(m.start(0), m.end(0)) for m in re.finditer(pattern, string)]

This should solve your issue:

pattern=r"(?=(\"[^\"]+\"|'[^']+'))"

Then use the following to get all overlapping indices:

indicesTuple = [(mObj.start(1),mObj.end(1)-1) for mObj in re.finditer(pattern,input)]

To get indice of all occurences:

S = input() # Source String
k = input() # String to be searched
import re
pattern = re.compile(k)
r = pattern.search(S)
if not r: print("(-1, -1)")
while r:
print("({0}, {1})".format(r.start(), r.end() - 1))
r = pattern.search(S,r.start() + 1)