如何找到一个子字符串的所有事件?

小开

最佳答案

没有简单的内置字符串函数来做你想要的，但你可以使用更强大的正则表达式:

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]

如果你想找到重叠的匹配，超前会这样做:

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]

如果你想要一个没有重叠的反向查找-all，你可以将正负前向组合成这样的表达式:

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]

re.finditer返回发电机，因此您可以将上面的[]更改为()以获得一个生成器，而不是一个列表，如果您只迭代一次结果，这将更有效。

小开

这里有一个(非常低效的)方法来获得所有(即重叠)匹配:

>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]

小开

对于非重叠匹配，可以使用re.finditer()。

>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]

但是不会工作:

In [1]: aString="ababa"


In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]

小开

>>> help(str.find)
Help on method_descriptor:


find(...)
S.find(sub [,start [,end]]) -> int

因此，我们可以自己构建它:

def find_all(a_str, sub):
start = 0
while True:
start = a_str.find(sub, start)
if start == -1: return
yield start
start += len(sub) # use start += 1 to find overlapping matches


list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]

不需要临时字符串或正则表达式。

小开

来，让我们一起递归。

def locations_of_substring(string, substring):
"""Return a list of locations of a substring."""


substring_length = len(substring)
def recurse(locations_found, start):
location = string.find(substring, start)
if location != -1:
return recurse(locations_found + [location], location+substring_length)
else:
return locations_found


return recurse([], 0)


print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]

不需要这样使用正则表达式。

小开

这个帖子有点老了，但对我来说很管用:

numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"


marker = 0
while marker < len(numberString):
try:
print(numberString.index("five",marker))
marker = numberString.index("five", marker) + 1
except ValueError:
print("String not found")
marker = len(numberString)

小开

如果你只是寻找一个单一的字符，这是可行的:

string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7

同时,

string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4

我的直觉是，这两个(尤其是#2)的性能都不太好。

小开

这是一个老帖子，但我很感兴趣，想分享我的解决方案。

def find_all(a_string, sub):
result = []
k = 0
while k < len(a_string):
k = a_string.find(sub, k)
if k == -1:
return result
else:
result.append(k)
k += 1 #change to k += len(sub) to not search overlapping results
return result

它应该返回找到子字符串的位置列表。

.

小开

再次，旧线程，但这里是我的解决方案使用发电机和普通的str.find。

def findall(p, s):
'''Yields all the positions of
the pattern p in the string s.'''
i = s.find(p)
while i != -1:
yield i
i = s.find(p, i+1)

例子

x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]

返回

[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]

小开

使用# EYZ1:

import re
sentence = input("Give me a sentence ")
word = input("What word would you like to find ")
for match in re.finditer(word, sentence):
print (match.start(), match.end())

对于word = "this"和sentence = "this is a sentence this this"，这将产生输出:

(0, 4)
(19, 23)
(24, 28)

小开

请看看下面的代码

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''




def get_substring_indices(text, s):
result = [i for i in range(len(text)) if text.startswith(s, i)]
return result




if __name__ == '__main__':
text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
s = 'wood'
print get_substring_indices(text, s)

小开

其他人提供的解决方案完全基于可用的find()方法或任何可用的方法。

找出a的所有出现点的核心基本算法是什么字符串中的子字符串?

def find_all(string,substring):
"""
Function: Returning all the index of substring in a string
Arguments: String and the search string
Return:Returning a list
"""
length = len(substring)
c=0
indexes = []
while c < len(string):
if string[c:c+length] == substring:
indexes.append(c)
c=c+1
return indexes

你也可以继承str类到新的类，并可以使用这个函数下面。

class newstr(str):
def find_all(string,substring):
"""
Function: Returning all the index of substring in a string
Arguments: String and the search string
Return:Returning a list
"""
length = len(substring)
c=0
indexes = []
while c < len(string):
if string[c:c+length] == substring:
indexes.append(c)
c=c+1
return indexes

调用方法

newstr。find_all('你觉得这个答案有用吗?然后upvote 这个!”、“这”)

小开

你可以试试:

>>> string = "test test test test"
>>> for index,value in enumerate(string):
if string[index:index+(len("test"))] == "test":
print index


0
5
10
15

小开

python的方法是:

mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]


# s represents the search string
# c represents the character string


find_all(mystring,'o')    # will return all positions of 'o'


[4, 7, 20, 26]
>>>

小开

这是我使用re.finditer的技巧

import re


text = 'This is sample text to test if this pythonic '\
'program can serve as an indexing platform for '\
'finding words in a paragraph. It can give '\
'values as to where the word is located with the '\
'different examples as stated'


#  find all occurances of the word 'as' in the above text


find_the_word = re.finditer('as', text)


for match in find_the_word:
print('start {}, end {}, search string \'{}\''.
format(match.start(), match.end(), match.group()))

小开

在文档中查找大量关键字时，使用flashtext

from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
kwp.add_keywords_from_list(words)
result = kwp.extract_keywords(txt, span_info=True)

在大量搜索词列表上，Flashtext比正则表达式运行得更快。

小开

您可以轻松使用:

string.count('test')!

https://www.programiz.com/python-programming/methods/string/count

干杯!

小开

通过切片，我们找到所有可能的组合，并将它们添加到一个列表中，并使用count函数查找它出现的次数

s=input()
n=len(s)
l=[]
f=input()
print(s[0])
for i in range(0,n):
for j in range(1,n+1):
l.append(s[i:j])
if f in l:
print(l.count(f))

小开

这个函数不会查看字符串内的所有位置，它不会浪费计算资源。我的尝试:

def findAll(string,word):
all_positions=[]
next_pos=-1
while True:
next_pos=string.find(word,next_pos+1)
if(next_pos<0):
break
all_positions.append(next_pos)
return all_positions

要使用它，可以这样调用它:

result=findAll('this word is a big word man how many words are there?','word')

小开

这是来自hackerrank的一个类似问题的解决方案。我希望这能帮助到你

import re
a = input()
b = input()
if b not in a:
print((-1,-1))
else:
#create two list as
start_indc = [m.start() for m in re.finditer('(?=' + b + ')', a)]
for i in range(len(start_indc)):
print((start_indc[i], start_indc[i]+len(b)-1))

输出:

aaadaa
aa
(0, 1)
(1, 2)
(4, 5)

小开

src = input() # we will find substring in this string
sub = input() # substring


res = []
pos = src.find(sub)
while pos != -1:
res.append(pos)
pos = src.find(sub, pos + 1)

小开

def find_index(string, let):
enumerated = [place  for place, letter in enumerate(string) if letter == let]
return enumerated

例如:

find_index("hey doode find d", "d")

返回:

[4, 7, 13, 15]

小开

这不完全是OP要求的，但你也可以使用分割函数来获得所有子字符串不出现的位置的列表。OP没有指定代码的最终目标，但如果您的目标是删除子字符串，那么这可能是一个简单的一行程序。对于更大的字符串，可能有更有效的方法来做到这一点;在这种情况下，正则表达式更可取

# Extract all non-substrings
s = "an-example-string"
s_no_dash = s.split('-')
# >>> s_no_dash
# ['an', 'example', 'string']


# Or extract and join them into a sentence
s_no_dash2 = ' '.join(s.split('-'))
# >>> s_no_dash2
# 'an example string'

我简单浏览了一下其他的答案，如果这个已经在上面了，我很抱歉。

小开

def count_substring(string, sub_string):
c=0
for i in range(0,len(string)-2):
if string[i:i+len(sub_string)] == sub_string:
c+=1
return c


if __name__ == '__main__':
string = input().strip()
sub_string = input().strip()
    

count = count_substring(string, sub_string)
print(count)

小开

如果您只想使用numpy，这里是一个解决方案

import numpy as np


S= "test test test test"
S2 = 'test'
inds = np.cumsum([len(k)+len(S2) for k in S.split(S2)[:-1]])- len(S2)
print(inds)

小开

我遇到了同样的问题，我是这样做的:

hw = 'Hello oh World!'
list_hw = list(hw)
o_in_hw = []


while True:
o = hw.find('o')
if o != -1:
o_in_hw.append(o)
list_hw[o] = ' '
hw = ''.join(list_hw)
else:
print(o_in_hw)
break

我在编码方面很新，所以你可以简化它(如果计划连续使用，当然要让它成为一个函数)。

所有和所有的工作都是为了我所做的。

编辑:请考虑这是单一字符，它会改变你的变量，所以你必须在一个新变量中创建一个字符串的副本来保存它，我没有把它放在代码中，因为它很容易，它只是为了展示我是如何使它工作的。

小开

你可以试试:

import re
str1 = "This dress looks good; you have good taste in clothes."
substr = "good"
result = [_.start() for _ in re.finditer(substr, str1)]
# result = [17, 32]

小开

如果你想使用没有re(regex)，那么:

find_all = lambda _str,_w : [ i for i in range(len(_str)) if _str.startswith(_w,i) ]


string = "test test test test"
print( find_all(string, 'test') ) # >>> [0, 5, 10, 15]

小开

下面是我想出的一个解决方案，使用赋值表达式(Python 3.8以来的新特性):

string = "test test test test"
phrase = "test"
start = -1
result = [(start := string.find(phrase, start + 1)) for _ in range(string.count(phrase))]

输出:

[0, 5, 10, 15]

小开

查找给定字符串中某个字符的所有出现次数，并作为字典返回 如:你好结果: {'h':1， 'e':1， 'l':2， 'o':1}

def count(string):
result = {}
if(string):
for i in string:
result[i] = string.count(i)
return result
return {}

否则你就像这样

from collections import Counter


def count(string):
return Counter(string)

小开

试试这个，对我有用!

x=input('enter the string')
y=input('enter the substring')
z,r=x.find(y),x.rfind(y)
while z!=r:
print(z,r,end=' ')
z=z+len(y)
r=r-len(y)
z,r=x.find(y,z,r),x.rfind(y,z,r)

小开

我认为最干净的解决方法是没有库，并产生:

def find_all_occurrences(string, sub):
index_of_occurrences = []
current_index = 0
while True:
current_index = string.find(sub, current_index)
if current_index == -1:
return index_of_occurrences
else:
index_of_occurrences.append(current_index)
current_index += len(sub)


occurrences(string, substr)

注意: find()方法在找不到任何东西时返回-1