正则表达式检测多个字符串之一

我有几个域名的电子邮件地址列表。我想要一个匹配属于三个特定域的地址的正则表达式(例如: foo、 bar 和 & baz)

所以这些符合:

  1. A@foo
  2. @ bar
  3. 巴兹

这不会:

  1. @ fnord

理想情况下,这两者都不会匹配(尽管它对于这个特定问题并不重要) :

  1. @ foobar
  2. B@foofoo

稍微抽象一下问题: 我想匹配一个至少包含给定子字符串列表中的一个的字符串。

138474 次浏览

Use the pipe symbol to indicate "or":

/a@(foo|bar|baz)\b/

If you don't want the capture-group, use the non-capturing grouping symbol:

/a@(?:foo|bar|baz)\b/

(Of course I'm assuming "a" is OK for the front of the email address! You should replace that with a suitable regex.)

should be more generic, the a shouldn't count, although the @ should.

/@(foo|bar|baz)(?:\W|$)/

Here is a good reference on regex.

edit: change ending to allow end of pattern or word break. now assuming foo/bar/baz are full domain names.

^(a|b)@(foo|bar|baz)$

if you have this strongly defined a list. The start and end character will only search for those three strings.

Use:

/@(foo|bar|baz)\.?$/i

Note the differences from other answers:

  • \.? - matching 0 or 1 dots, in case the domains in the e-mail address are "fully qualified"
  • $ - to indicate that the string must end with this sequence,
  • /i - to make the test case insensitive.

Note, this assumes that each e-mail address is on a line on its own.

If the string being matched could be anywhere in the string, then drop the $, and replace it with \s+ (which matches one or more white space characters)

If the previous (and logical) answers about '|' don't suit you, have a look at

http://metacpan.org/pod/Regex::PreSuf

module description : create regular expressions from word lists

You don't need a regex to find whether a string contains at least one of a given list of substrings. In Python:

def contain(string_, substrings):
return any(s in string_ for s in substrings)

The above is slow for a large string_ and many substrings. GNU fgrep can efficiently search for multiple patterns at the same time.

Using regex

import re


def contain(string_, substrings):
regex = '|'.join("(?:%s)" % re.escape(s) for s in substrings)
return re.search(regex, string_) is not None

Related

Ok I know you asked for a regex answer. But have you considered just splitting the string with the '@' char taking the second array value (the domain) and doing a simple match test

if (splitString[1] == "foo" && splitString[1] == "bar" && splitString[1] == "baz")
{
//Do Something!
}

Seems to me that RegEx is overkill. Of course my assumption is that your case is really as simple as you have listed.