如何在特定的子字符串后获得字符串?

我如何能得到一个特定的子字符串后的字符串?

例如,我想获取"world"之后的字符串

my_string="hello python world, I'm a beginner"

...在本例中是:", I'm a beginner")

813456 次浏览

最简单的方法可能就是把你的目标单词分开

my_string="hello python world , i'm a beginner"
print(my_string.split("world",1)[1])

Split接受要拆分的单词(或字符),并可选地限制拆分的次数。

在这个例子中,在"world"而且只能开一次。

s1 = "hello python world , i'm a beginner"
s2 = "world"


print(s1[s1.index(s2) + len(s2):])

如果你想处理出现在s1中的s2的情况,那么使用s1.find(s2)而不是index。如果该调用的返回值是-1,则s2不在s1中。

如果你想使用regex来做这个,你可以简单地使用无组,来获取单词“world”,然后获取后面的所有内容,就像这样

(?:world).*

示例字符串被测试为在这里

我很惊讶没有人提到partition

def substring_after(s, delim):
return s.partition(delim)[2]


s1="hello python world, I'm a beginner"
substring_after(s1, "world")


# ", I'm a beginner"

恕我直言,这个解决方案比@arshajii的更具可读性。除此之外,我认为@arshajii的是最好的,因为它是最快的——它不会创建任何不必要的副本/子字符串。

这是一个老问题,但我遇到了一个非常相同的场景,我需要分割字符串使用“;low"对我来说,问题是我在同一个字符串的单词下面和更低。

我用re模块这样解出来的

import re


string = '...below...as higher prices mean lower demand to be expected. Generally, a high reading is seen as negative (or bearish), while a low reading is seen as positive (or bullish) for the Korean Won.'


# use re.split with regex to match the exact word
stringafterword = re.split('\\blow\\b',string)[-1]


print(stringafterword)
# ' reading is seen as positive (or bullish) for the Korean Won.'


# the generic code is:
re.split('\\bTHE_WORD_YOU_WANT\\b',string)[-1]

希望这能帮助到一些人!

你可以使用名为substring的包。只需使用pip install substring命令安装即可。您可以通过只提到开始和结束字符/索引来获得子字符串。

例如:

import substring
s = substring.substringByChar("abcdefghijklmnop", startChar="d", endChar="n")
print(s)

输出:

# s = defghijklmn

你想要使用str.partition():

>>> my_string.partition("world")[2]
" , i'm a beginner "

因为这个选项是比其他选择更快

注意,如果没有分隔符,将产生一个空字符串:

>>> my_string.partition("Monty")[2]  # delimiter missing
''

如果你想要原始字符串,那么测试从str.partition()返回的第二个值是否非空:

prefix, success, result = my_string.partition(delimiter)
if not success: result = prefix

你也可以使用限制为1的str.split():

>>> my_string.split("world", 1)[-1]
" , i'm a beginner "
>>> my_string.split("Monty", 1)[-1]  # delimiter missing
"hello python world , i'm a beginner "

然而,这个选项是。在最好的情况下,与str.split()相比,str.partition()很容易约为快15%:

                                missing        first         lower         upper          last
str.partition(...)[2]:  [3.745 usec]  [0.434 usec]  [1.533 usec]  <3.543 usec>  [4.075 usec]
str.partition(...) and test:   3.793 usec    0.445 usec    1.597 usec    3.208 usec    4.170 usec
str.split(..., 1)[-1]:  <3.817 usec>  <0.518 usec>  <1.632 usec>  [3.191 usec]  <4.173 usec>
% best vs worst:         1.9%         16.2%          6.1%          9.9%          2.3%

这显示在这里输入分隔符的计时每次执行要么丢失(最坏的情况),放在前面(最好的情况),要么在下半部分,上半部分或最后一个位置。最快的时间用[...]标记,最坏的时间用<...>标记。

上表是对以下所有三个选项进行综合计时赛后得出的。我在2017年款15英寸Macbook Pro上用Python 3.7.4运行了测试,配备2.9 GHz英特尔酷睿i7和16 GB内存。

该脚本生成带有或不带有随机选择的分隔符的随机句子,如果存在,则在生成的句子的不同位置以随机顺序重复运行测试(产生最公平的结果,说明测试期间发生的随机操作系统事件),然后打印结果表:

import random
from itertools import product
from operator import itemgetter
from pathlib import Path
from timeit import Timer


setup = "from __main__ import sentence as s, delimiter as d"
tests = {
"str.partition(...)[2]": "r = s.partition(d)[2]",
"str.partition(...) and test": (
"prefix, success, result = s.partition(d)\n"
"if not success: result = prefix"
),
"str.split(..., 1)[-1]": "r = s.split(d, 1)[-1]",
}


placement = "missing first lower upper last".split()
delimiter_count = 3


wordfile = Path("/usr/dict/words")  # Linux
if not wordfile.exists():
# macos
wordfile = Path("/usr/share/dict/words")
words = [w.strip() for w in wordfile.open()]


def gen_sentence(delimiter, where="missing", l=1000):
"""Generate a random sentence of length l


The delimiter is incorporated according to the value of where:


"missing": no delimiter
"first":   delimiter is the first word
"lower":   delimiter is present in the first half
"upper":   delimiter is present in the second half
"last":    delimiter is the last word


"""
possible = [w for w in words if delimiter not in w]
sentence = random.choices(possible, k=l)
half = l // 2
if where == "first":
# best case, at the start
sentence[0] = delimiter
elif where == "lower":
# lower half
sentence[random.randrange(1, half)] = delimiter
elif where == "upper":
sentence[random.randrange(half, l)] = delimiter
elif where == "last":
sentence[-1] = delimiter
# else: worst case, no delimiter


return " ".join(sentence)


delimiters = random.choices(words, k=delimiter_count)
timings = {}
sentences = [
# where, delimiter, sentence
(w, d, gen_sentence(d, w)) for d, w in product(delimiters, placement)
]
test_mix = [
# label, test, where, delimiter sentence
(*t, *s) for t, s in product(tests.items(), sentences)
]
random.shuffle(test_mix)


for i, (label, test, where, delimiter, sentence) in enumerate(test_mix, 1):
print(f"\rRunning timed tests, {i:2d}/{len(test_mix)}", end="")
t = Timer(test, setup)
number, _ = t.autorange()
results = t.repeat(5, number)
# best time for this specific random sentence and placement
timings.setdefault(
label, {}
).setdefault(
where, []
).append(min(dt / number for dt in results))


print()


scales = [(1.0, 'sec'), (0.001, 'msec'), (1e-06, 'usec'), (1e-09, 'nsec')]
width = max(map(len, timings))
rows = []
bestrow = dict.fromkeys(placement, (float("inf"), None))
worstrow = dict.fromkeys(placement, (float("-inf"), None))


for row, label in enumerate(tests):
columns = []
worst = float("-inf")
for p in placement:
timing = min(timings[label][p])
if timing < bestrow[p][0]:
bestrow[p] = (timing, row)
if timing > worstrow[p][0]:
worstrow[p] = (timing, row)
worst = max(timing, worst)
columns.append(timing)


scale, unit = next((s, u) for s, u in scales if worst >= s)
rows.append(
[f"{label:>{width}}:", *(f" {c / scale:.3f} {unit} " for c in columns)]
)


colwidth = max(len(c) for r in rows for c in r[1:])
print(' ' * (width + 1), *(p.center(colwidth) for p in placement), sep="  ")
for r, row in enumerate(rows):
for c, p in enumerate(placement, 1):
if bestrow[p][1] == r:
row[c] = f"[{row[c][1:-1]}]"
elif worstrow[p][1] == r:
row[c] = f"<{row[c][1:-1]}>"
print(*row, sep="  ")


percentages = []
for p in placement:
best, worst = bestrow[p][0], worstrow[p][0]
ratio = ((worst - best) / worst)
percentages.append(f"{ratio:{colwidth - 1}.1%} ")


print("% best vs worst:".rjust(width + 1), *percentages, sep="  ")

试试下面的方法:

import re


my_string="hello python world , i'm a beginner"
p = re.compile("world(.*)")
print(p.findall(my_string))


# [" , i'm a beginner "]

在Python 3.9中,添加了一个新的removeprefix方法:

>>> 'TestHook'.removeprefix('Test')
'Hook'
>>> 'BaseTestCase'.removeprefix('Test')
'BaseTestCase'