检查另一个字符串中是否存在多个字符串

如何检查数组中的任何字符串是否存在于另一个字符串中?

如:

a = ['a', 'b', 'c']
str = "a123"
if a in str:
print "some of the strings found in str"
else:
print "no strings found in str"

这段代码不起作用,它只是为了显示我想要实现的目标。

596934 次浏览

你可以使用<强> # EYZ0 < / >强:

a_string = "A string is more than its parts!"
matches = ["more", "wholesome", "milk"]


if any(x in a_string for x in matches):

类似地,要检查列表中的字符串是否找到所有,使用<强> # EYZ0 < / >强而不是any

的元素上进行迭代。

a = ['a', 'b', 'c']
str = "a123"
found_a_string = False
for item in a:
if item in str:
found_a_string = True


if found_a_string:
print "found a match"
else:
print "no match found"
a = ['a', 'b', 'c']
str =  "a123"


a_match = [True for match in a if match in str]


if True in a_match:
print "some of the strings found in str"
else:
print "no strings found in str"

如果astr中的字符串变长,您应该小心。简单的解决方案是O(S*(A^2)),其中Sstr的长度,A是a中所有字符串长度的和。要获得更快的解决方案,请查看用于字符串匹配的Aho-Corasick算法,该算法在线性时间O(S+ a)内运行。

只是为了增加regex的多样性:

import re


if any(re.findall(r'a|b|c', str, re.IGNORECASE)):
print 'possible matches thanks to regex'
else:
print 'no matches'

或者如果你的列表太长- any(re.findall(r'|'.join(a), str, re.IGNORECASE))

如果你想要的只是TrueFalseany()是目前为止最好的方法,但如果你想知道具体哪个字符串/字符串匹配,你可以使用一些东西。

如果你想要第一个匹配(默认为False):

match = next((x for x in a if x in str), False)

如果你想获得所有匹配项(包括重复项):

matches = [x for x in a if x in str]

如果你想获得所有非重复的匹配(不考虑顺序):

matches = {x for x in a if x in str}

如果你想按正确的顺序获得所有非重复的匹配项:

matches = []
for x in a:
if x in str and x not in matches:
matches.append(x)

这取决于上下文 假设如果你想检查单个文字(任何单个单词a,e,w,..等)就足够了

original_word ="hackerearcth"
for 'h' in original_word:
print("YES")

如果你想检查original_word中的任何一个字符: 使用

if any(your_required in yourinput for your_required in original_word ):

如果您想要original_word中的所有输入,请使用所有输入 简单的< / p >

original_word = ['h', 'a', 'c', 'k', 'e', 'r', 'e', 'a', 'r', 't', 'h']
yourinput = str(input()).lower()
if all(requested_word in yourinput for requested_word in original_word):
print("yes")

jbernadas为了降低复杂性已经提到了Aho-Corasick-Algorithm

下面是在Python中使用它的一种方法:

  1. 下载aho_corasick.py from here

  2. 将它放在与Python主文件相同的目录中,并命名为aho_corasick.py

  3. 尝试以下代码的算法:

    from aho_corasick import aho_corasick #(string, keywords)
    
    
    print(aho_corasick(string, ["keyword1", "keyword2"]))
    

Note that the search is case-sensitive

flog = open('test.txt', 'r')
flogLines = flog.readlines()
strlist = ['SUCCESS', 'Done','SUCCESSFUL']
res = False
for line in flogLines:
for fstr in strlist:
if line.find(fstr) != -1:
print('found')
res = True




if res:
print('res true')
else:
print('res false')

output example image

为了提高速度,我会使用这样的函数:

def check_string(string, substring_list):
for substring in substring_list:
if substring in string:
return True
return False
data = "firstName and favoriteFood"
mandatory_fields = ['firstName', 'lastName', 'age']




# for each
for field in mandatory_fields:
if field not in data:
print("Error, missing req field {0}".format(field));


# still fine, multiple if statements
if ('firstName' not in data or
'lastName' not in data or
'age' not in data):
print("Error, missing a req field");


# not very readable, list comprehension
missing_fields = [x for x in mandatory_fields if x not in data]
if (len(missing_fields)>0):
print("Error, missing fields {0}".format(", ".join(missing_fields)));

只是关于如何在String中获得所有列表元素的更多信息

a = ['a', 'b', 'c']
str = "a123"
list(filter(lambda x:  x in str, a))

一个惊人的快速方法是使用set:

a = ['a', 'b', 'c']
str = "a123"
if set(a) & set(str):
print("some of the strings found in str")
else:
print("no strings found in str")

如果a不包含任何多字符值(在这种情况下使用any作为列出的以上),则此方法有效。如果是这样,将a指定为字符串更简单:a = 'abc'

这是set的另一个解。使用# EYZ0。对于一行代码。

subset = {"some" ,"words"}
text = "some words to be searched here"
if len(subset & set(text.split())) == len(subset):
print("All values present in text")


if subset & set(text.split()):
print("Atleast one values present in text")

python 文档中推荐的正则表达式模块支持此功能

words = {'he', 'or', 'low'}
p = regex.compile(r"\L<name>", name=words)
m = p.findall('helloworld')
print(m)

输出:

['he', 'low', 'or']

实现的一些细节:链接

在另一个字符串列表中查找多个字符串的一种紧凑方法是使用set.intersection。这比大型集或列表中的列表理解执行得快得多。

>>> astring = ['abc','def','ghi','jkl','mno']
>>> bstring = ['def', 'jkl']
>>> a_set = set(astring)  # convert list to set
>>> b_set = set(bstring)
>>> matches = a_set.intersection(b_set)
>>> matches
{'def', 'jkl'}
>>> list(matches) # if you want a list instead of a set
['def', 'jkl']
>>>

如果您想要单词的精确匹配,那么可以考虑对目标字符串进行单词标记。我使用nltk推荐的word_tokenize:

from nltk.tokenize import word_tokenize

下面是接受答案的标记化字符串:

a_string = "A string is more than its parts!"
tokens = word_tokenize(a_string)
tokens
Out[46]: ['A', 'string', 'is', 'more', 'than', 'its', 'parts', '!']

接受的答案修改如下:

matches_1 = ["more", "wholesome", "milk"]
[x in tokens for x in matches_1]
Out[42]: [True, False, False]

在公认的答案中,单词“more”;仍然匹配。如果“mo"成为匹配字符串,但是,接受的答案仍然找到匹配。这是我不希望看到的行为。

matches_2 = ["mo", "wholesome", "milk"]
[x in a_string for x in matches_1]
Out[43]: [True, False, False]

使用单词标记化,"mo"不再匹配:

[x in tokens for x in matches_2]
Out[44]: [False, False, False]

这是我想要的附加行为。这个答案还回答了重复的问题在这里