匹配 Python 正则表达式中包括换行符的任何字符，而不是全局匹配

我想使用 re.MULTILINE但是没有 re.DOTALL，这样我可以有一个正则表达式，其中包括一个“任意字符”通配符和不匹配换行符的普通 .通配符。

有办法吗？我应该使用什么来匹配我想包含换行符的实例中的任何字符？

80575

小开

最佳答案

To match a newline, or "any symbol" without re.S/re.DOTALL, you may use any of the following:

(?s). - the inline modifier group with s flag on sets a scope where all . patterns match any char including line break chars
Any of the following work-arounds:

[\s\S]
[\w\W]
[\d\D]

The main idea is that the opposite shorthand classes inside a character class match any symbol there is in the input string.

Comparing it to (.|\s) and other variations with alternation, the character class solution is much more efficient as it involves much less backtracking (when used with a * or + quantifier). Compare the small example: it takes (?:.|\n)+ 45 steps to complete, and it takes [\s\S]+ just 2 steps.

See a Python demo where I am matching a line starting with 123 and up to the first occurrence of 3 at the start of a line and including the rest of that line:

import re
text = """abc
123
def
356
more text..."""
print( re.findall(r"^123(?s:.*?)^3.*", text, re.M) )
# => ['123\ndef\n356']
print( re.findall(r"^123[\w\W]*?^3.*", text, re.M) )
# => ['123\ndef\n356']

Example:

import re text = 'abc def ###A quick brown fox.\nIt jumps over the lazy dog### ghi jkl' # We want to extract "A quick brown fox.\nIt jumps over the lazy dog" matches = re.findall('###[\S\n ]+###', text) print(matches[0])

The 'matches[0]' will contain:
'A quick brown fox.\nIt jumps over the lazy dog'

匹配 Python 正则表达式中包括换行符的任何字符，而不是全局匹配

Match any character (including new line):

Example:

Description of '\S' Python docs: