在 Python 中将具有未知数量空格的字符串拆分为分隔符

我需要一个类似于 str.split(' ')的函数,但是可能有不止一个空格,并且在有意义的字符之间有不同数量的空格。就像这样:

s = ' 1234    Q-24 2010-11-29         563   abc  a6G47er15        '
ss = s.magic_split()
print(ss)  # ['1234', 'Q-24', '2010-11-29', '563', 'abc', 'a6G47er15']

我可以用正则表达式来捕捉这两者之间的空格吗?

86565 次浏览

If you don't pass any arguments to str.split(), it will treat runs of whitespace as a single separator:

>>> ' 1234    Q-24 2010-11-29         563   abc  a6G47er15'.split()
['1234', 'Q-24', '2010-11-29', '563', 'abc', 'a6G47er15']
s = ' 1234    Q-24 2010-11-29         563   abc  a6G47er15        '
ss = s.split()
print(ss)  # ['1234', 'Q-24', '2010-11-29', '563', 'abc', 'a6G47er15']

If you have single spaces amid your data (like an address in one field), here's a solution for when the delimiter has two or more spaces:

with open("textfile.txt") as f:
content = f.readlines()


for line in content:
# Get all variable-length spaces down to two. Then use two spaces as the delimiter.
while line.replace("   ", "  ") != line:
line = line.replace("   ", "  ")


# The strip is optional here.
data = line.strip().split("  ")
print(data)

To split lines by multiple spaces while keeping single spaces in strings:

with open("textfile.txt") as f:
for line in f:
line = [i.strip() for i in line.split('  ') if i]
print(line)

There are many solutions to this question.

1.) Using split() is the simplest method

s = ' 1234    Q-24 2010-11-29         563   abc  a6G47er15              '
s = s.split()
print(s)




Output >> ['1234','Q-24','2010-11-29','563','abc','a6G47er15']

2.) There is another way to solve this using findall() method, you need to "import re" in the starting of your python file.

import re
def MagicString(str):
return re.findall(r'\S+', str)
s = ' 1234    Q-24 2010-11-29         563   abc  a6G47er15'
s = MagicString(s)
print(s)
print(MagicString('    he  ll   o'))




Output >> ['1234','Q-24','2010-11-29','563','abc','a6G47er15']
Output >> ['he','ll','o']

3.) If you want to remove any leading (spaces at the beginning) and trailing (spaces at the end) alone use strip().

s = '   hello          '
output = s.strip()
print(output)




Output >> hello

We can also use regex's split method here too.

import re


sample = ' 1234    Q-24 2010-11-29         563   abc  a6G47er15        '


word_list = re.split("\s+", sample.strip())


print(word_list) #['1234', 'Q-24', '2010-11-29', '563', 'abc', 'a6G47er15']

I hope this might help someone