Python 中的匹配组

在 Python 中有没有一种方法可以访问匹配组而不显式创建匹配对象(或者另一种方法来美化下面的示例) ?

这里有一个例子来说明我提出这个问题的动机:

遵循 Perl 代码

if    ($statement =~ /I love (\w+)/) {
print "He loves $1\n";
}
elsif ($statement =~ /Ich liebe (\w+)/) {
print "Er liebt $1\n";
}
elsif ($statement =~ /Je t\'aime (\w+)/) {
print "Il aime $1\n";
}

翻译成巨蟒

m = re.search("I love (\w+)", statement)
if m:
print "He loves",m.group(1)
else:
m = re.search("Ich liebe (\w+)", statement)
if m:
print "Er liebt",m.group(1)
else:
m = re.search("Je t'aime (\w+)", statement)
if m:
print "Il aime",m.group(1)

看起来非常笨拙(if-else-层叠,匹配对象创建)。

198417 次浏览

Less efficient, but simpler-looking:

m0 = re.match("I love (\w+)", statement)
m1 = re.match("Ich liebe (\w+)", statement)
m2 = re.match("Je t'aime (\w+)", statement)
if m0:
print("He loves", m0.group(1))
elif m1:
print("Er liebt", m1.group(1))
elif m2:
print("Il aime", m2.group(1))

The problem with the Perl stuff is the implicit updating of some hidden variable. That's simply hard to achieve in Python because you need to have an assignment statement to actually update any variables.

The version with less repetition (and better efficiency) is this:

pats = [
("I love (\w+)", "He Loves {0}" ),
("Ich liebe (\w+)", "Er Liebe {0}" ),
("Je t'aime (\w+)", "Il aime {0}")
]
for p1, p3 in pats:
m = re.match(p1, statement)
if m:
print(p3.format(m.group(1)))
break

A minor variation that some Perl folk prefer:

pats = {
"I love (\w+)" : "He Loves {0}",
"Ich liebe (\w+)" : "Er Liebe {0}",
"Je t'aime (\w+)" : "Il aime {0}",
}
for p1 in pats:
m = re.match(p1, statement)
if m:
print(pats[p1].format(m.group(1)))
break

This is hardly worth mentioning except it does come up sometimes from Perl programmers.

this is not a regex solution.

alist={"I love ":""He loves"","Je t'aime ":"Il aime","Ich liebe ":"Er liebt"}
for k in alist.keys():
if k in statement:
print alist[k],statement.split(k)[1:]

You could create a helper function:

def re_match_group(pattern, str, out_groups):
del out_groups[:]
result = re.match(pattern, str)
if result:
out_groups[:len(result.groups())] = result.groups()
return result

And then use it like this:

groups = []
if re_match_group("I love (\w+)", statement, groups):
print "He loves", groups[0]
elif re_match_group("Ich liebe (\w+)", statement, groups):
print "Er liebt", groups[0]
elif re_match_group("Je t'aime (\w+)", statement, groups):
print "Il aime", groups[0]

It's a little clunky, but it gets the job done.

You could create a little class that returns the boolean result of calling match, and retains the matched groups for subsequent retrieval:

import re


class REMatcher(object):
def __init__(self, matchstring):
self.matchstring = matchstring


def match(self,regexp):
self.rematch = re.match(regexp, self.matchstring)
return bool(self.rematch)


def group(self,i):
return self.rematch.group(i)




for statement in ("I love Mary",
"Ich liebe Margot",
"Je t'aime Marie",
"Te amo Maria"):


m = REMatcher(statement)


if m.match(r"I love (\w+)"):
print "He loves",m.group(1)


elif m.match(r"Ich liebe (\w+)"):
print "Er liebt",m.group(1)


elif m.match(r"Je t'aime (\w+)"):
print "Il aime",m.group(1)


else:
print "???"

Update for Python 3 print as a function, and Python 3.8 assignment expressions - no need for a REMatcher class now:

import re


for statement in ("I love Mary",
"Ich liebe Margot",
"Je t'aime Marie",
"Te amo Maria"):


if m := re.match(r"I love (\w+)", statement):
print("He loves", m.group(1))


elif m := re.match(r"Ich liebe (\w+)", statement):
print("Er liebt", m.group(1))


elif m := re.match(r"Je t'aime (\w+)", statement):
print("Il aime", m.group(1))


else:
print()

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), we can now capture the condition value re.search(pattern, statement) in a variable (let's all it match) in order to both check if it's not None and then re-use it within the body of the condition:

if match := re.search('I love (\w+)', statement):
print(f'He loves {match.group(1)}')
elif match := re.search("Ich liebe (\w+)", statement):
print(f'Er liebt {match.group(1)}')
elif match := re.search("Je t'aime (\w+)", statement):
print(f'Il aime {match.group(1)}')