在字符串 Python 中,如何获得 a: 之前的所有内容

我正在寻找一种方法,以获得一个字符串前的所有字母: ,但我不知道从哪里开始。我会使用正则表达式吗?如果是这样,怎么做?

string = "Username: How are you today?"

有人能给我举个例子吗?

402252 次浏览

只要使用 split函数,它会返回一个列表,这样你就可以保留第一个元素:

>>> s1.split(':')
['Username', ' How are you today?']
>>> s1.split(':')[0]
'Username'

你不需要 regex

>>> s = "Username: How are you today?"

可以使用 split方法分割 ':'字符上的字符串

>>> s.split(':')
['Username', ' How are you today?']

并切出元素 [0]以获得字符串的第一部分

>>> s.split(':')[0]
'Username'

使用 index:

>>> string = "Username: How are you today?"
>>> string[:string.index(":")]
'Username'

索引将给出字符串中 :的位置,然后可以对其进行切片。

如果要使用正则表达式:

>>> import re
>>> re.match("(.*?):",string).group()
'Username'

match从字符串开始匹配。

你也可以使用 itertools.takewhile

>>> import itertools
>>> "".join(itertools.takewhile(lambda x: x!=":", string))
'Username'

我已经在 Python3.7.0(IPython)下对这些技术进行了基准测试。

TLDR

  • 最快(当已知拆分符号 c时) : 预编译的正则表达式。
  • 最快(否则) : s.partition(c)[0]
  • 安全(即,当 c可能不在 s中时) : 分区,分割。
  • 不安全的: index,regex。

密码

import string, random, re


SYMBOLS = string.ascii_uppercase + string.digits
SIZE = 100


def create_test_set(string_length):
for _ in range(SIZE):
random_string = ''.join(random.choices(SYMBOLS, k=string_length))
yield (random.choice(random_string), random_string)


for string_length in (2**4, 2**8, 2**16, 2**32):
print("\nString length:", string_length)
print("  regex (compiled):", end=" ")
test_set_for_regex = ((re.compile("(.*?)" + c).match, s) for (c, s) in test_set)
%timeit [re_match(s).group() for (re_match, s) in test_set_for_regex]
test_set = list(create_test_set(16))
print("  partition:       ", end=" ")
%timeit [s.partition(c)[0] for (c, s) in test_set]
print("  index:           ", end=" ")
%timeit [s[:s.index(c)] for (c, s) in test_set]
print("  split (limited): ", end=" ")
%timeit [s.split(c, 1)[0] for (c, s) in test_set]
print("  split:           ", end=" ")
%timeit [s.split(c)[0] for (c, s) in test_set]
print("  regex:           ", end=" ")
%timeit [re.match("(.*?)" + c, s).group() for (c, s) in test_set]

结果

String length: 16
regex (compiled): 156 ns ± 4.41 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
partition:        19.3 µs ± 430 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
index:            26.1 µs ± 341 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split (limited):  26.8 µs ± 1.26 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split:            26.3 µs ± 835 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
regex:            128 µs ± 4.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


String length: 256
regex (compiled): 167 ns ± 2.7 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
partition:        20.9 µs ± 694 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
index:            28.6 µs ± 2.73 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split (limited):  27.4 µs ± 979 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split:            31.5 µs ± 4.86 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
regex:            148 µs ± 7.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


String length: 65536
regex (compiled): 173 ns ± 3.95 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
partition:        20.9 µs ± 613 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
index:            27.7 µs ± 515 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split (limited):  27.2 µs ± 796 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split:            26.5 µs ± 377 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
regex:            128 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


String length: 4294967296
regex (compiled): 165 ns ± 1.2 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
partition:        19.9 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
index:            27.7 µs ± 571 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split (limited):  26.1 µs ± 472 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split:            28.1 µs ± 1.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
regex:            137 µs ± 6.53 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

为了实现这个目的,分区() 可能比拆分()更好,因为对于没有分隔符或更多分隔符的情况,它具有更好的可预测结果。

要使用正则表达式解决这个问题,可以使用负向前瞻/负向后瞻方法。

例如,下面的 Python 代码:

import re
string = "Username: How are you today?"
regex='(\S*)[:]'


data=re.findall(regex, string)
print(data)