如何用 python re.sub 仅替换部分匹配

我需要用一个 reg 表达式匹配两个案例并进行替换

‘ long.file.name.jpg’-> ‘ long.file.name _ 很糟糕.jpg’

‘ long.file.name _ .jpg’-> ‘ long.file.name _ 很糟糕.jpg’

我正在尝试做以下事情

re.sub('(\_a)?\.[^\.]*$' , '_suff.',"long.file.name.jpg")

但是,这是削减扩展’。 jpg’和我得到

Name _ suff. 而不是 long.file.name _ suff. jpg 我知道这是因为[ ^ . ] * $part,但我不能排除它,因为 我必须找到‘ _ a’的最后一次出现来替换或持续’

有没有办法只替换比赛的一部分?

90308 次浏览

Put a capture group around the part that you want to preserve, and then include a reference to that capture group within your replacement text.

re.sub(r'(\_a)?\.([^\.]*)$' , r'_suff.\2',"long.file.name.jpg")

Just put the expression for the extension into a group, capture it and reference the match in the replacement:

re.sub(r'(?:_a)?(\.[^\.]*)$' , r'_suff\1',"long.file.name.jpg")

Additionally, using the non-capturing group (?:…) will prevent re to store to much unneeded information.

 re.sub(r'(?:_a)?\.([^.]*)$', r'_suff.\1', "long.file.name.jpg")

?: starts a non matching group (SO answer), so (?:_a) is matching the _a but not enumerating it, the following question mark makes it optional.

So in English, this says, match the ending .<anything> that follows (or doesn't) the pattern _a

Another way to do this would be to use a lookbehind (see here). Mentioning this because they're super useful, but I didn't know of them for 15 years of doing REs

You can do it by excluding the parts from replacing. I mean, you can say to the regex module; "match with this pattern, but replace a piece of it".

re.sub(r'(?<=long.file.name)(\_a)?(?=\.([^\.]*)$)' , r'_suff',"long.file.name.jpg")
>>> 'long.file.name_suff.jpg'

long.file.name and .jpg parts are being used on matching, but they are excluding from replacing.

print(re.sub('name(_a)?','name_suff','long.file.name_a.jpg'))
# long.file.name_suff.jpg


print(re.sub('name(_a)?','name_suff','long.file.name.jpg'))
# long.file.name_suff.jpg

I wanted to use capture groups to replace a specific part of a string to help me parse it later. Consider the example below:

s= '<td> <address> 110 SOLANA ROAD, SUITE 102<br>PONTE VEDRA BEACH, FL32082 </address> </td>'


re.sub(r'(<address>\s.*?)(<br>)(.*?\<\/address>)', r'\1 -- \3', s)
##'<td> <address> 110 SOLANA ROAD, SUITE 102 -- PONTE VEDRA BEACH, FL32082 </address> </td>'