“不匹配”的正则表达式语法?

我有一个大量使用 regexp 的 Python 模板引擎:

re.compile( regexp1 + "|" + regexp2 + "*|" + regexp3 + "+" )

我可以修改各个子字符串(regexp1、 regexp2等)。

是否有任何小而轻的表达式匹配不了任何东西,我可以在模板中使用它,而不需要任何匹配?不幸的是,有时候’+’或’*’被附加到 regexp 原子中,所以我不能使用空字符串-这将产生一个“ nothing to repeat”错误。

96530 次浏览

This shouldn't match anything:

re.compile('$^')

So if you replace regexp1, regexp2 and regexp3 with '$^' it will be impossible to find a match. Unless you are using the multi line mode.


After some tests I found a better solution

re.compile('a^')

It is impossible to match and will fail earlier than the previous solution. You can replace a with any other character and it will always be impossible to match

Maybe '.{0}'?

To match an empty string - even in multiline mode - you can use \A\Z, so:

re.compile('\A\Z|\A\Z*|\A\Z+')

The difference is that \A and \Z are start and end of string, whilst ^ and $ these can match start/end of lines, so $^|$^*|$^+ could potentially match a string containing newlines (if the flag is enabled).

And to fail to match anything (even an empty string), simply attempt to find content before the start of the string, e.g:

re.compile('.\A|.\A*|.\A+')

Since no characters can come before \A (by definition), this will always fail to match.

You could use
\z..
This is the absolute end of string, followed by two of anything

If + or * is tacked on the end this still works refusing to match anything

Or, use some list comprehension to remove the useless regexp entries and join to put them all together. Something like:

re.compile('|'.join([x for x in [regexp1, regexp2, ...] if x != None]))

Be sure to add some comments next to that line of code though :-)

(?!) should always fail to match. It is the zero-width negative look-ahead. If what is in the parentheses matches then the whole match fails. Given that it has nothing in it, it will fail the match for anything (including nothing).