每n个字符拆分字符串?

是否有可能分割字符串每n个字符?

例如,假设我有一个包含以下内容的字符串:

'1234567890'

我怎样才能让它看起来像这样:

['12','34','56','78','90']

对于列表的相同问题,请参见我如何将列表分割成相等大小的块?同样的技术通常适用,尽管有一些变化。

562110 次浏览

试试下面的代码:

from itertools import islice


def split_every(n, iterable):
i = iter(iterable)
piece = list(islice(i, n))
while piece:
yield piece
piece = list(islice(i, n))


s = '1234567890'
print list(split_every(2, list(s)))

我认为这比itertools版本更短,更可读:

def split_by_n(seq, n):
'''A generator to divide a sequence into chunks of n units.'''
while seq:
yield seq[:n]
seq = seq[n:]


print(list(split_by_n('1234567890', 2)))
>>> from functools import reduce
>>> from operator import add
>>> from itertools import izip
>>> x = iter('1234567890')
>>> [reduce(add, tup) for tup in izip(x, x)]
['12', '34', '56', '78', '90']
>>> x = iter('1234567890')
>>> [reduce(add, tup) for tup in izip(x, x, x)]
['123', '456', '789']
>>> line = '1234567890'
>>> n = 2
>>> [line[i:i+n] for i in range(0, len(line), n)]
['12', '34', '56', '78', '90']

另一种常见的将元素分组为n长度组的方法:

>>> s = '1234567890'
>>> map(''.join, zip(*[iter(s)]*2))
['12', '34', '56', '78', '90']

这个方法直接来自zip()的文档。

为了完整,你可以用regex来完成:

>>> import re
>>> re.findall('..','1234567890')
['12', '34', '56', '78', '90']

对于奇数个字符,你可以这样做:

>>> import re
>>> re.findall('..?', '123456789')
['12', '34', '56', '78', '9']

你也可以这样做,简化正则表达式为更长的块:

>>> import re
>>> re.findall('.{1,2}', '123456789')
['12', '34', '56', '78', '9']

如果字符串很长,可以使用re.finditer逐块生成。

我喜欢这个解决方案:

s = '1234567890'
o = []
while s:
o.append(s[:2])
s = s[2:]

你可以使用itertools中的grouper()配方:

Python 2.x:

from itertools import izip_longest


def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)

Python 3.x:

from itertools import zip_longest


def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)

这些函数是内存高效的,并且适用于任何可迭代对象。

一如既往,为那些喜欢一句俏皮话的人

n = 2
line = "this is a line split into n characters"
line = [line[i * n:i * n+n] for i,blah in enumerate(line[::n])]

从PyPI使用more-itertools:

>>> from more_itertools import sliced
>>> list(sliced('1234567890', 2))
['12', '34', '56', '78', '90']

more_itertools.sliced以前是提到。下面是more_itertools库中的四个选项:

s = "1234567890"


["".join(c) for c in mit.grouper(2, s)]


["".join(c) for c in mit.chunked(s, 2)]


["".join(c) for c in mit.windowed(s, 2, step=2)]


["".join(c) for c in  mit.split_after(s, lambda x: int(x) % 2 == 0)]

后面的每个选项都会产生以下输出:

['12', '34', '56', '78', '90']

讨论选项的文档:grouperchunkedwindowedsplit_after

python中已经有一个内置的函数。

>>> from textwrap import wrap
>>> s = '1234567890'
>>> wrap(s, 2)
['12', '34', '56', '78', '90']

这是wrap的文档字符串说的:

>>> help(wrap)
'''
Help on function wrap in module textwrap:


wrap(text, width=70, **kwargs)
Wrap a single paragraph of text, returning a list of wrapped lines.


Reformat the single paragraph in 'text' so it fits in lines of no
more than 'width' columns, and return a list of wrapped lines.  By
default, tabs in 'text' are expanded with string.expandtabs(), and
all other whitespace characters (including newline) are converted to
space.  See TextWrapper class for available keyword args to customize
wrapping behaviour.
'''

试试这个:

s='1234567890'
print([s[idx:idx+2] for idx,val in enumerate(s) if idx%2 == 0])

输出:

['12', '34', '56', '78', '90']

短字符串的简单递归解决方案:

def split(s, n):
if len(s) < n:
return []
else:
return [s[:n]] + split(s[n:], n)


print(split('1234567890', 2))

或以这样的形式:

def split(s, n):
if len(s) < n:
return []
elif len(s) == n:
return [s]
else:
return split(s[:n], n) + split(s[n:], n)

,它更明确地说明了递归方法中的典型分治模式(尽管实际上没有必要这样做)。

我陷入了同样的困境。

这招对我很管用:

x = "1234567890"
n = 2
my_list = []
for i in range(0, len(x), n):
my_list.append(x[i:i+n])
print(my_list)

输出:

['12', '34', '56', '78', '90']

这可以通过一个简单的for循环来实现。

a = '1234567890a'
result = []


for i in range(0, len(a), 2):
result.append(a[i : i + 2])
print(result)

输出如下 ['12', '34', '56', '78', '90', 'a']

使用groupby的解决方案:

from itertools import groupby, chain, repeat, cycle


text = "wwworldggggreattecchemggpwwwzaz"
n = 3
c = cycle(chain(repeat(0, n), repeat(1, n)))
res = ["".join(g) for _, g in groupby(text, lambda x: next(c))]
print(res)

输出:

['www', 'orl', 'dgg', 'ggr', 'eat', 'tec', 'che', 'mgg', 'pww', 'wza', 'z']

这些答案都很好,很有用,但是语法太神秘了……为什么不写一个简单的函数呢?

def SplitEvery(string, length):
if len(string) <= length: return [string]
sections = len(string) / length
lines = []
start = 0;
for i in range(sections):
line = string[start:start+length]
lines.append(line)
start += length
return lines


简单地叫它:

text = '1234567890'
lines = SplitEvery(text, 2)
print(lines)


# output: ['12', '34', '56', '78', '90']