Most pythonic way to interleave two strings

小开

与 join()和 zip()。

>>> ''.join(''.join(item) for item in zip(u,l))
'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

小开

最佳答案

For me, the most pythonic* way is the following which 差不多也是这样 but uses the + operator for concatenating the individual characters in each string:

res = "".join(i + j for i, j in zip(u, l))
print(res)
# 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

它也比使用两个 join()呼叫更快:

In [5]: l1 = 'A' * 1000000; l2 = 'a' * 1000000


In [6]: %timeit "".join("".join(item) for item in zip(l1, l2))
1 loops, best of 3: 442 ms per loop


In [7]: %timeit "".join(i + j for i, j in zip(l1, l2))
1 loops, best of 3: 360 ms per loop

Faster approaches exist, but they often obfuscate the code.

注意: 如果两个输入字符串的长度相同，那么较长的字符串将被截断，因为 < a href = “ https://docs.python.org/3/library/function tions.html # zip”> zip 在较短字符串的末尾停止迭代。在这种情况下，应该使用 itertools模块中的 < a href = “ https://docs.python.org/3/library/itertools.html # itertools.zip _ long”> zip_longest (Python 2中的 izip_longest )来代替 zip，以确保两个字符串都被完全耗尽。

_{引用巨蟒之禅: 可读性很重要的话 * 。

Python = 我的可读性; i + j只是在视觉上更容易解析，至少对我的眼睛来说是这样}

小开

If you want the fastest way, you can combine Itertools with operator.add:

In [36]: from operator import add


In [37]: from itertools import  starmap, izip


In [38]: timeit "".join([i + j for i, j in uzip(l1, l2)])
1 loops, best of 3: 142 ms per loop


In [39]: timeit "".join(starmap(add, izip(l1,l2)))
1 loops, best of 3: 117 ms per loop


In [40]: timeit "".join(["".join(item) for item in zip(l1, l2)])
1 loops, best of 3: 196 ms per loop


In [41]:  "".join(starmap(add, izip(l1,l2))) ==  "".join([i + j   for i, j in izip(l1, l2)]) ==  "".join(["".join(item) for item in izip(l1, l2)])
Out[42]: True

但结合 izip和 chain.from_iterable再次更快

In [2]: from itertools import  chain, izip


In [3]: timeit "".join(chain.from_iterable(izip(l1, l2)))
10 loops, best of 3: 98.7 ms per loop

这两者之间也有实质性的区别 chain(*和 chain.from_iterable(...。

In [5]: timeit "".join(chain(*izip(l1, l2)))
1 loops, best of 3: 212 ms per loop

没有带 join 的生成器，传递 one 总是比较慢，因为 python 会首先使用内容构建一个列表，因为它会两次传递数据，一次是为了确定所需的大小，另一次是为了实际执行 join，而使用生成器是不可能的:

返回文章页面加入:

 /* Here is the general case.  Do a pre-pass to figure out the total
* amount of space we'll need (sz), and see whether all arguments are
* bytes-like.
*/

另外，如果你有不同长度的字符串，你不想丢失数据，你可以使用最长:

In [22]: from itertools import izip_longest
In [23]: a,b = "hlo","elworld"


In [24]:  "".join(chain.from_iterable(izip_longest(a, b,fillvalue="")))
Out[24]: 'helloworld'

For python 3 it is called zip_longest

但对于 python2来说，veedrac 的建议是迄今为止最快的:

In [18]: %%timeit
res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)
....:
100 loops, best of 3: 2.68 ms per loop

小开

更快的选择

另一种方式:

res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
print(''.join(res))

产出:

'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

速度

看起来它更快:

%%timeit
res = [''] * len(u) * 2
res[::2] = u
res[1::2] = l
''.join(res)


100000 loops, best of 3: 4.75 µs per loop

目前为止最快的解决方案:

%timeit "".join(list(chain.from_iterable(zip(u, l))))


100000 loops, best of 3: 6.52 µs per loop

对于较大的弦也是如此:

l1 = 'A' * 1000000; l2 = 'a' * 1000000


%timeit "".join(list(chain.from_iterable(zip(l1, l2))))
1 loops, best of 3: 151 ms per loop




%%timeit
res = [''] * len(l1) * 2
res[::2] = l1
res[1::2] = l2
''.join(res)


10 loops, best of 3: 92 ms per loop

Python 3.5.1.

不同长度的弦的变化

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = 'abcdefghijkl'

较短的一个决定长度(`zip()`等效)

min_len = min(len(u), len(l))
res = [''] * min_len * 2
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
print(''.join(res))

产出:

AaBbCcDdEeFfGgHhIiJjKkLl

长者决定长度(`itertools.zip_longest(fillvalue='')`等效)

min_len = min(len(u), len(l))
res = [''] * min_len * 2
res[::2] = u[:min_len]
res[1::2] = l[:min_len]
res += u[min_len:] + l[min_len:]
print(''.join(res))

产出:

AaBbCcDdEeFfGgHhIiJjKkLlMNOPQRSTUVWXYZ

小开

我喜欢使用两个 for，变量名可以提示/提醒正在发生的事情:

"".join(char for pair in zip(u,l) for char in pair)

小开

Just to add another, more basic approach:

st = ""
for char in u:
st = "{0}{1}{2}".format( st, char, l[ u.index( char ) ] )

小开

许多建议假设字符串的长度相等。也许这涵盖了所有合理的用例，但至少对我来说，您似乎也想容纳不同长度的字符串。或者我是唯一一个认为网格应该像这样工作的人:

u = "foobar"
l = "baz"
mesh(u,l) = "fboaozbar"

做到这一点的方法之一是:

def mesh(a,b):
minlen = min(len(a),len(b))
return "".join(["".join(x+y for x,y in zip(a,b)),a[minlen:],b[minlen:]])

小开

在 Python2中，通过很远实现的更快的方法是，对于小字符串，大约是列表切片速度的3倍，对于长字符串，大约是30倍

res = bytearray(len(u) * 2)
res[::2] = u
res[1::2] = l
str(res)

但是，这在 Python3上无法工作

res = bytearray(len(u) * 2)
res[::2] = u.encode("ascii")
res[1::2] = l.encode("ascii")
res.decode("ascii")

但是到那时，您已经失去了对小字符串进行列表切片的收益(仍然是长字符串的20倍) ，而且这甚至还不适用于非 ASCII 字符。

FWIW，如果你在大量字符串上进行是操作，并且需要每一个循环，那么还有由于某些原因必须使用 Python 字符串... 下面是如何做到这一点:

res = bytearray(len(u) * 4 * 2)


u_utf32 = u.encode("utf_32_be")
res[0::8] = u_utf32[0::4]
res[1::8] = u_utf32[1::4]
res[2::8] = u_utf32[2::4]
res[3::8] = u_utf32[3::4]


l_utf32 = l.encode("utf_32_be")
res[4::8] = l_utf32[0::4]
res[5::8] = l_utf32[1::4]
res[6::8] = l_utf32[2::4]
res[7::8] = l_utf32[3::4]


res.decode("utf_32_be")

特殊套管的小型类型的常见情况下也将有所帮助。FWIW，对于长字符串，这只是列表切片速度的3倍，对于小字符串，这是4到5 慢一点的因子。

无论哪种方式，我更喜欢 join解决方案，但由于时间提到了其他地方，我认为我不妨加入。

小开

你也可以使用 map和 operator.add:

from operator import add


u = 'AAAAA'
l = 'aaaaa'


s = "".join(map(add, u, l))

产出 :

'AaAaAaAaAa'

Map 的作用是获取第一个可迭代 u中的每个元素和第二个可迭代 l中的第一个元素，并应用作为第一个参数 add提供的函数。那就加入，加入他们。

小开

可能比目前的主要解决办法更快、更短:

from itertools import chain u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' l = 'abcdefghijklmnopqrstuvwxyz' res = "".join(chain(*zip(u, l)))

策略的速度明智之处在于尽可能多地在 C 级别进行操作。对于不均匀的字符串来说，相同的 zip _ long ()修复程序，而且它将会和 chain ()来自同一个模块，所以不能在这里给我太多的提示！

在这个过程中，我想到了其他解决方案:

res = "".join(u[x] + l[x] for x in range(len(u))) res = "".join(k + l[i] for i, k in enumerate(u))

小开

Jim's answer is great, but here's my favorite option, if you don't mind a couple of imports:

from functools import reduce from operator import add reduce(add, map(add, u, l))

小开

我会使用 zip ()来获得一个易读且简单的方法:

result = '' for cha, chb in zip(u, l): result += '%s%s' % (cha, chb) print result # 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

小开

不考虑这里的双列表理解答案，用 O (1)处理 n 个字符串，感觉有点不太符合 Python:

"".join(c for cs in itertools.zip_longest(*all_strings) for c in cs)

其中 all_strings是要交错的字符串列表。对你来说，是 all_strings = [u, l]。一个完全使用的示例如下所示:

import itertools a = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' b = 'abcdefghijklmnopqrstuvwxyz' all_strings = [a,b] interleaved = "".join(c for cs in itertools.zip_longest(*all_strings) for c in cs) print(interleaved) # 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

像许多答案一样，最快？也许不是，但是简单灵活。而且，在不增加太多复杂性的情况下，这比可接受的答案稍微快一点(通常，在 python 中字符串添加有点慢) :

In [7]: l1 = 'A' * 1000000; l2 = 'a' * 1000000; In [8]: %timeit "".join(a + b for i, j in zip(l1, l2)) 1 loops, best of 3: 227 ms per loop In [9]: %timeit "".join(c for cs in zip(*(l1, l2)) for c in cs) 1 loops, best of 3: 198 ms per loop

小开

你可以用 iteration_utilities.roundrobin¹

u = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' l = 'abcdefghijklmnopqrstuvwxyz' from iteration_utilities import roundrobin ''.join(roundrobin(u, l)) # returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

或来自同一软件包的 ManyIterables类:

from iteration_utilities import ManyIterables ManyIterables(u, l).roundrobin().as_string() # returns 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'

^{这是从第三方图书馆，我已经写: iteration_utilities。}

Most pythonic way to interleave two strings

更快的选择

速度

不同长度的弦的变化

较短的一个决定长度(zip()等效)

长者决定长度(itertools.zip_longest(fillvalue='')等效)

较短的一个决定长度(`zip()`等效)

长者决定长度(`itertools.zip_longest(fillvalue='')`等效)