如何在 python 中返回匹配的正则表达式中的字符串？

我正在使用 python脚本运行文本文件中的行。我想在文本文档中搜索 img标记，并将标记作为文本返回。

当我运行正则表达式 re.match(line)时，它返回一个 _sre.SRE_MATCH对象。如何让它返回字符串？

import sys
import string
import re


f = open("sample.txt", 'r' )
l = open('writetest.txt', 'w')


count = 1


for line in f:
line = line.rstrip()
imgtag  = re.match(r'<img.*?>',line)
print("yo it's a {}".format(imgtag))

当运行它时，它会打印:

yo it's a None
yo it's a None
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e5e0>
yo it's a None
yo it's a None

python
regex

174596 次浏览

小开

最佳答案

You should use re.MatchObject.group(0). Like

imtag = re.match(r'<img.*?>', line).group(0)

Edit:

You also might be better off doing something like

imgtag  = re.match(r'<img.*?>',line)
if imtag:
print("yo it's a {}".format(imgtag.group(0)))

to eliminate all the Nones.

小开

imgtag.group(0) or imgtag.group(). This returns the entire match as a string. You are not capturing anything else either.

http://docs.python.org/release/2.5.2/lib/match-objects.html

小开

Considering there might be several img tags I would recommend re.findall:

import re


with open("sample.txt", 'r') as f_in, open('writetest.txt', 'w') as f_out:
for line in f_in:
for img in re.findall('<img[^>]+>', line):
print >> f_out, "yo it's a {}".format(img)

小开

Note that re.match(pattern, string, flags=0) only returns matches at the beginning of the string. If you want to locate a match anywhere in the string, use re.search(pattern, string, flags=0) instead (https://docs.python.org/3/library/re.html). This will scan the string and return the first match object. Then you can extract the matching string with match_object.group(0) as the folks suggested.