如何将列表拆分为大小相等的块?

如何将任意长度的列表拆分为相等大小的块?


请参阅如何在块中迭代列表如果数据结果将直接用于循环,并且不需要存储。

对于字符串输入的相同问题,请参阅每隔n个字符拆分字符串?

1499887 次浏览

这是一个生成均匀大小块的生成器:

def chunks(lst, n):"""Yield successive n-sized chunks from lst."""for i in range(0, len(lst), n):yield lst[i:i + n]
import pprintpprint.pprint(list(chunks(range(10, 75), 10)))[[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],[70, 71, 72, 73, 74]]

对于Python 2,使用xrange而不是range

def chunks(lst, n):"""Yield successive n-sized chunks from lst."""for i in xrange(0, len(lst), n):yield lst[i:i + n]

下面是一个列表理解单行代码。不过,上面的方法更可取,因为使用命名函数使代码更容易理解。对于Python 3:

[lst[i:i + n] for i in range(0, len(lst), n)]

对于Python 2:

[lst[i:i + n] for i in xrange(0, len(lst), n)]

如果您知道列表大小:

def SplitList(mylist, chunk_size):return [mylist[offs:offs+chunk_size] for offs in range(0, len(mylist), chunk_size)]

如果你不这样做(迭代器):

def IterChunks(sequence, chunk_size):res = []for item in sequence:res.append(item)if len(res) >= chunk_size:yield resres = []if res:yield res  # yield the last, incomplete, portion

在后一种情况下,如果您可以确保序列始终包含给定大小的整数个块(即没有不完整的最后一个块),则可以以更漂亮的方式重新措辞。

这是一个处理任意可迭代对象的生成器:

def split_seq(iterable, size):it = iter(iterable)item = list(itertools.islice(it, size))while item:yield itemitem = list(itertools.islice(it, size))

示例:

>>> import pprint>>> pprint.pprint(list(split_seq(xrange(75), 10)))[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],[70, 71, 72, 73, 74]]

呵呵,单行版

In [48]: chunk = lambda ulist, step:  map(lambda i: ulist[i:i+step],  xrange(0, len(ulist), step))
In [49]: chunk(range(1,100), 10)Out[49]:[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],[11, 12, 13, 14, 15, 16, 17, 18, 19, 20],[21, 22, 23, 24, 25, 26, 27, 28, 29, 30],[31, 32, 33, 34, 35, 36, 37, 38, 39, 40],[41, 42, 43, 44, 45, 46, 47, 48, 49, 50],[51, 52, 53, 54, 55, 56, 57, 58, 59, 60],[61, 62, 63, 64, 65, 66, 67, 68, 69, 70],[71, 72, 73, 74, 75, 76, 77, 78, 79, 80],[81, 82, 83, 84, 85, 86, 87, 88, 89, 90],[91, 92, 93, 94, 95, 96, 97, 98, 99]]

直接从(旧的)Python留档(迭代工具的食谱):

from itertools import izip, chain, repeat
def grouper(n, iterable, padvalue=None):"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)

目前的版本,由J. F. Sebastian建议:

#from itertools import izip_longest as zip_longest # for Python 2.xfrom itertools import zip_longest # for Python 3.x#from six.moves import zip_longest # for both (uses the six compat library)
def grouper(n, iterable, padvalue=None):"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

我猜圭多的时间机器工作了——工作了——会工作的——会工作的——又工作了。

这些解决方案之所以有效,是因为[iter(iterable)]*n(或早期版本中的等价物)创建了一个迭代器,在列表中重复n次。izip_longest然后有效地执行“每个”迭代器的循环;因为这是同一个迭代器,它被每个这样的调用推进,导致每个这样的zip-roundrobin生成一个n项的元组。

def split_seq(seq, num_pieces):start = 0for i in xrange(num_pieces):stop = start + len(seq[i::num_pieces])yield seq[start:stop]start = stop

用法:

seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for seq in split_seq(seq, 3):print seq
def chunk(lst):out = []for x in xrange(2, len(lst) + 1):if not len(lst) % x:factor = len(lst) / xbreakwhile lst:out.append([lst.pop(0) for x in xrange(factor)])return out
>>> def f(x, n, acc=[]): return f(x[n:], n, acc+[(x[:n])]) if x else acc>>> f("Hallo Welt", 3)['Hal', 'lo ', 'Wel', 't']>>>

如果你进入括号-我拿起一本关于Erlang的书:)

超级简单的东西:

def chunks(xs, n):n = max(1, n)return (xs[i:i+n] for i in range(0, len(xs), n))

对于Python 2,使用xrange()而不是range()

不调用len(),这对大型列表很有用:

def splitter(l, n):i = 0chunk = l[:n]while chunk:yield chunki += nchunk = l[i:i+n]

这是用于迭代的:

def isplitter(l, n):l = iter(l)chunk = list(islice(l, n))while chunk:yield chunkchunk = list(islice(l, n))

上述功能的味道:

def isplitter2(l, n):return takewhile(bool,(tuple(islice(start, n))for start in repeat(iter(l))))

或:

def chunks_gen_sentinel(n, seq):continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))return iter(imap(tuple, continuous_slices).next,())

或:

def chunks_gen_filter(n, seq):continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))return takewhile(bool,imap(tuple, continuous_slices))
def chunk(input, size):return map(None, *([iter(input)] * size))

简单而优雅

L = range(1, 1000)print [L[x:x+10] for x in xrange(0, len(L), 10)]

或者如果您喜欢:

def chunks(L, n): return [L[x: x+n] for x in xrange(0, len(L), n)]chunks(L, 10)

例如,如果你有一个块大小为3,你可以这样做:

zip(*[iterable[i::3] for i in range(3)])

来源:http://code.activestate.com/recipes/303060-group-a-list-into-sequential-n-tuples/

当我的块大小是我可以键入的固定数字时,我会使用它,例如“3”,并且永远不会改变。

考虑使用matplotlib.cbook

例如:

import matplotlib.cbook as cbooksegments = cbook.pieces(np.arange(20), 3)for s in segments:print s
def chunks(iterable,n):"""assumes n is an integer>0"""iterable=iter(iterable)while True:result=[]for i in range(n):try:a=next(iterable)except StopIteration:breakelse:result.append(a)if result:yield resultelse:break
g1=(i*i for i in range(10))g2=chunks(g1,3)print g2'<generator object chunks at 0x0337B9B8>'print list(g2)'[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81]]'

我意识到这个问题很老(在谷歌上被绊倒了),但肯定像下面这样的东西比任何巨大的复杂建议都简单明了,只使用切片:

def chunker(iterable, chunksize):for i,c in enumerate(iterable[::chunksize]):yield iterable[i*chunksize:(i+1)*chunksize]
>>> for chunk in chunker(range(0,100), 10):...     print list(chunk)...[0, 1, 2, 3, 4, 5, 6, 7, 8, 9][10, 11, 12, 13, 14, 15, 16, 17, 18, 19][20, 21, 22, 23, 24, 25, 26, 27, 28, 29]... etc ...

这一参考

>>> orange = range(1, 1001)>>> otuples = list( zip(*[iter(orange)]*10))>>> print(otuples)[(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ... (991, 992, 993, 994, 995, 996, 997, 998, 999, 1000)]>>> olist = [list(i) for i in otuples]>>> print(olist)[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], ..., [991, 992, 993, 994, 995, 996, 997, 998, 999, 1000]]>>>

python3

使用python的列表理解

[range(t,t+10) for t in range(1,1000,10)]
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],[11, 12, 13, 14, 15, 16, 17, 18, 19, 20],[21, 22, 23, 24, 25, 26, 27, 28, 29, 30],........[981, 982, 983, 984, 985, 986, 987, 988, 989, 990],[991, 992, 993, 994, 995, 996, 997, 998, 999, 1000]]

访问此链接了解列表理解

我知道这有点旧,但还没有人提到#0

import numpy as np
lst = range(50)np.array_split(lst, 5)

结果:

[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]),array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39]),array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])]
  • 适用于任何可迭代
  • 内部数据是生成器对象(不是列表)
  • 一个班轮
In [259]: get_in_chunks = lambda itr,n: ( (v for _,v in g) for _,g in itertools.groupby(enumerate(itr),lambda (ind,_): ind/n))
In [260]: list(list(x) for x in get_in_chunks(range(30),7))Out[260]:[[0, 1, 2, 3, 4, 5, 6],[7, 8, 9, 10, 11, 12, 13],[14, 15, 16, 17, 18, 19, 20],[21, 22, 23, 24, 25, 26, 27],[28, 29]]

我非常喜欢tzot和J. F. Sebastian提出的Python文档版本,但它有两个缺点:

  • 这不是很明确
  • 我通常不希望最后一个块有填充值

我在我的代码中经常使用这个:

from itertools import islice
def chunks(n, iterable):iterable = iter(iterable)while True:yield tuple(islice(iterable, n)) or iterable.next()

更新:懒惰块版本:

from itertools import chain, islice
def chunks(n, iterable):iterable = iter(iterable)while True:yield chain([next(iterable)], islice(iterable, n-1))

Toolz库有partition函数:

from toolz.itertoolz.core import partition
list(partition(2, [1, 2, 3, 4]))[(1, 2), (3, 4)]

是的,这是一个老问题,但我不得不发布这个,因为它甚至比类似的短一点。是的,结果看起来是混乱的,但如果它只是关于偶数长度…

>>> n = 3 # number of groups>>> biglist = range(30)>>>>>> [ biglist[i::n] for i in xrange(n) ][[0, 3, 6, 9, 12, 15, 18, 21, 24, 27],[1, 4, 7, 10, 13, 16, 19, 22, 25, 28],[2, 5, 8, 11, 14, 17, 20, 23, 26, 29]]

如何将列表拆分为大小均匀的块?

“均匀大小的块”,对我来说,意味着它们都是相同的长度,或者排除该选项,长度为最小方差。例如。21个项目的5个篮子可能会有以下结果:

>>> import statistics>>> statistics.variance([5,5,5,5,1])3.2>>> statistics.variance([5,4,4,4,4])0.19999999999999998

更喜欢后一种结果的一个实际原因是:如果你使用这些函数来分配工作,你已经内置了一个可能在其他函数之前完成的前景,所以它会无所事事,而其他函数继续努力工作。

对其他答案的批评

当我最初写这个答案时,其他答案都不是均匀大小的块——它们都在最后留下了一个矮小的块,所以它们没有很好的平衡,并且长度的方差高于必要的方差。

例如,当前的顶部答案以:

[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],[70, 71, 72, 73, 74]]

其他的,像list(grouper(3, range(7)))chunk(range(7), 3)都返回:[(0, 1, 2), (3, 4, 5), (6, None, None)]None只是填充,在我看来相当不优雅。他们没有均匀地分块可迭代对象。

为什么我们不能更好地分配它们?

循环解决方案

使用itertools.cycle的高级平衡解决方案,这是我今天可能会做的方式。这是设置:

from itertools import cycleitems = range(10, 75)number_of_baskets = 10

现在我们需要我们的列表来填充元素:

baskets = [[] for _ in range(number_of_baskets)]

最后,我们将要分配的元素与篮子的循环压缩在一起,直到我们用完元素,从语义上讲,这正是我们想要的:

for element, basket in zip(items, cycle(baskets)):basket.append(element)

下面是结果:

>>> from pprint import pprint>>> pprint(baskets)[[10, 20, 30, 40, 50, 60, 70],[11, 21, 31, 41, 51, 61, 71],[12, 22, 32, 42, 52, 62, 72],[13, 23, 33, 43, 53, 63, 73],[14, 24, 34, 44, 54, 64, 74],[15, 25, 35, 45, 55, 65],[16, 26, 36, 46, 56, 66],[17, 27, 37, 47, 57, 67],[18, 28, 38, 48, 58, 68],[19, 29, 39, 49, 59, 69]]

为了使这个解决方案产品化,我们编写了一个函数,并提供类型注释:

from itertools import cyclefrom typing import List, Any
def cycle_baskets(items: List[Any], maxbaskets: int) -> List[List[Any]]:baskets = [[] for _ in range(min(maxbaskets, len(items)))]for item, basket in zip(items, cycle(baskets)):basket.append(item)return baskets

在上面,我们获取项目列表和篮子的最大数量。我们创建了一个空列表列表,以循环方式在其中附加每个元素。

切片

另一个优雅的解决方案是使用切片-特别是不太常用的步骤参数切片。即:

start = 0stop = Nonestep = number_of_baskets
first_basket = items[start:stop:step]

这是特别优雅的,因为切片不关心数据有多长-结果,我们的第一个篮子,只有它需要的长度。我们只需要增加每个篮子的起点。

事实上,这可能是一行代码,但为了易读性和避免过长的代码行,我们将使用多行代码:

from typing import List, Any
def slice_baskets(items: List[Any], maxbaskets: int) -> List[List[Any]]:n_baskets = min(maxbaskets, len(items))return [items[i::n_baskets] for i in range(n_baskets)]

其中的islice模块将提供一种懒惰的迭代方法,就像问题中最初要求的那样。

我不希望大多数用例都能受益,因为原始数据已经完全物化在列表中,但对于大型数据集,它可以节省近一半的内存使用。

from itertools import islicefrom typing import List, Any, Generator    
def yield_islice_baskets(items: List[Any], maxbaskets: int) -> Generator[List[Any], None, None]:n_baskets = min(maxbaskets, len(items))for i in range(n_baskets):yield islice(items, i, None, n_baskets)

查看结果:

from pprint import pprint
items = list(range(10, 75))pprint(cycle_baskets(items, 10))pprint(slice_baskets(items, 10))pprint([list(s) for s in yield_islice_baskets(items, 10)])

更新先前的解决方案

这是另一个平衡解决方案,改编自我过去在生产中使用的函数,它使用模运算符:

def baskets_from(items, maxbaskets=25):baskets = [[] for _ in range(maxbaskets)]for i, item in enumerate(items):baskets[i % maxbaskets].append(item)return filter(None, baskets)

我创建了一个生成器,如果你把它放在一个列表中,它也会做同样的事情:

def iter_baskets_from(items, maxbaskets=3):'''generates evenly balanced baskets from indexable iterable'''item_count = len(items)baskets = min(item_count, maxbaskets)for x_i in range(baskets):yield [items[y_i] for y_i in range(x_i, item_count, baskets)]    

最后,因为我看到上面所有的函数都以连续的顺序返回元素(就像给出的那样):

def iter_baskets_contiguous(items, maxbaskets=3, item_count=None):'''generates balanced baskets from iterable, contiguous contentsprovide item_count if providing a iterator that doesn't support len()'''item_count = item_count or len(items)baskets = min(item_count, maxbaskets)items = iter(items)floor = item_count // basketsceiling = floor + 1stepdown = item_count % basketsfor x_i in range(baskets):length = ceiling if x_i < stepdown else flooryield [items.next() for _ in range(length)]

产出

为了测试它们:

print(baskets_from(range(6), 8))print(list(iter_baskets_from(range(6), 8)))print(list(iter_baskets_contiguous(range(6), 8)))print(baskets_from(range(22), 8))print(list(iter_baskets_from(range(22), 8)))print(list(iter_baskets_contiguous(range(22), 8)))print(baskets_from('ABCDEFG', 3))print(list(iter_baskets_from('ABCDEFG', 3)))print(list(iter_baskets_contiguous('ABCDEFG', 3)))print(baskets_from(range(26), 5))print(list(iter_baskets_from(range(26), 5)))print(list(iter_baskets_contiguous(range(26), 5)))

打印出来:

[[0], [1], [2], [3], [4], [5]][[0], [1], [2], [3], [4], [5]][[0], [1], [2], [3], [4], [5]][[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]][[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]][[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14], [15, 16, 17], [18, 19], [20, 21]][['A', 'D', 'G'], ['B', 'E'], ['C', 'F']][['A', 'D', 'G'], ['B', 'E'], ['C', 'F']][['A', 'B', 'C'], ['D', 'E'], ['F', 'G']][[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]][[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]][[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]]

请注意,连续生成器提供与其他两个相同长度模式的块,但项目都是按顺序排列的,并且它们被均匀划分,就像可以划分离散元素列表一样。

我很惊讶没有人想到使用iter双参数形式

from itertools import islice
def chunk(it, size):it = iter(it)return iter(lambda: tuple(islice(it, size)), ())

演示:

>>> list(chunk(range(14), 3))[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]

这适用于任何可迭代对象,并延迟生成输出。它返回元组而不是迭代器,但我认为它仍然有一定的优雅。它也没有填充;如果你想要填充,上面的一个简单变体就足够了:

from itertools import islice, chain, repeat
def chunk_pad(it, size, padval=None):it = chain(iter(it), repeat(padval))return iter(lambda: tuple(islice(it, size)), (padval,) * size)

演示:

>>> list(chunk_pad(range(14), 3))[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]>>> list(chunk_pad(range(14), 3, 'a'))[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

就像基于izip_longest的解决方案一样,上面的总是垫。据我所知,对于可选垫的函数来说,没有单行或两行迭代工具配方。通过结合上述两种方法,这一方法非常接近:

_no_padding = object()
def chunk(it, size, padval=_no_padding):if padval == _no_padding:it = iter(it)sentinel = ()else:it = chain(iter(it), repeat(padval))sentinel = (padval,) * sizereturn iter(lambda: tuple(islice(it, size)), sentinel)

演示:

>>> list(chunk(range(14), 3))[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]>>> list(chunk(range(14), 3, None))[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]>>> list(chunk(range(14), 3, 'a'))[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

我相信这是提供可选填充的最短块。

作为Tomasz Gandor观察,如果两个填充分块遇到一长串垫值,它们将意外停止。这是以合理方式解决该问题的最后一个变体:

_no_padding = object()def chunk(it, size, padval=_no_padding):it = iter(it)chunker = iter(lambda: tuple(islice(it, size)), ())if padval == _no_padding:yield from chunkerelse:for ch in chunker:yield ch if len(ch) == size else ch + (padval,) * (size - len(ch))

演示:

>>> list(chunk([1, 2, (), (), 5], 2))[(1, 2), ((), ()), (5,)]>>> list(chunk([1, 2, None, None, 5], 2, None))[(1, 2), (None, None), (5, None)]

我专门为此目的编写了一个小型库,可用这里。该库的chunked函数特别有效,因为它被实现为发生器,因此在某些情况下可以节省大量内存。它也不依赖切片表示法,因此可以使用任何任意迭代器。

import iterlib
print list(iterlib.chunked(xrange(1, 1000), 10))# prints [(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), (11, 12, 13, 14, 15, 16, 17, 18, 19, 20), ...]

像@AaronHall一样,我在这里寻找大致均匀大小的块。对此有不同的解释。在我的例子中,如果所需的大小是N,我希望每个组的大小>=N。因此,在上面大部分创建的孤儿应该重新分配给其他组。

这可以使用:

def nChunks(l, n):""" Yield n successive chunks from l.Works for lists,  pandas dataframes, etc"""newn = int(1.0 * len(l) / n + 0.5)for i in xrange(0, n-1):yield l[i*newn:i*newn+newn]yield l[n*newn-newn:]

(从将一个列表分成N个长度近似相等的部分)通过简单地将其称为nCHUNKS(l, l/n)或nCHUNKS(l,楼(l/n))

让r是块大小,L是初始列表,你可以做到。

chunkL = [ [i for i in L[r*k:r*(k+1)] ] for k in range(len(L)/r)]

使用列表推导:

l = [1,2,3,4,5,6,7,8,9,10,11,12]k = 5 #chunk sizeprint [tuple(l[x:y]) for (x, y) in [(x, x+k) for x in range(0, len(l), k)]]

更明确的版本。

def chunkList(initialList, chunkSize):"""This function chunks a list into sub liststhat have a length equals to chunkSize.
Example:lst = [3, 4, 9, 7, 1, 1, 2, 3]print(chunkList(lst, 3))returns[[3, 4, 9], [7, 1, 1], [2, 3]]"""finalList = []for i in range(0, len(initialList), chunkSize):finalList.append(initialList[i:i+chunkSize])return finalList

我在这个问题的重复中看到了最棒的Python-ish答案:

from itertools import zip_longest
a = range(1, 16)i = iter(a)r = list(zip_longest(i, i, i))>>> print(r)[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]

您可以为任何n创建n元组。如果a = range(1, 15),那么结果将是:

[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, None)]

如果列表被均匀划分,那么您可以将zip_longest替换为zip,否则三元组(13, 14, None)将丢失。上面使用了Python 3。对于Python 2,使用izip_longest

上面的答案(由koffein)有一个小问题:列表总是被拆分成相等数量的拆分,而不是每个分区相等数量的项目。这是我的版本。“//chs+1”考虑到项目数量可能无法完全被分区大小整除,所以最后一个分区只会被部分填充。

# Given 'l' is your list
chs = 12 # Your chunksizepartitioned = [ l[i*chs:(i*chs)+chs] for i in range((len(l) // chs)+1) ]

代码:

def split_list(the_list, chunk_size):result_list = []while the_list:result_list.append(the_list[:chunk_size])the_list = the_list[chunk_size:]return result_list
a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print split_list(a_list, 3)

结果:

[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]
a = [1, 2, 3, 4, 5, 6, 7, 8, 9]CHUNK = 4[a[i*CHUNK:(i+1)*CHUNK] for i in xrange((len(a) + CHUNK - 1) / CHUNK )]

我想出了以下没有创建临时列表对象的解决方案,它应该适用于任何可迭代对象。请注意,此版本适用于Python 2. x:

def chunked(iterable, size):stop = []it = iter(iterable)def _next_chunk():try:for _ in xrange(size):yield next(it)except StopIteration:stop.append(True)return
while not stop:yield _next_chunk()
for it in chunked(xrange(16), 4):print list(it)

输出:

[0, 1, 2, 3][4, 5, 6, 7][8, 9, 10, 11][12, 13, 14, 15][]

正如你所看到的,如果len(iterable)%size==0,那么我们就有了额外的空迭代器对象。但我不认为这是个大问题。

由于我必须做这样的事情,这是我的解决方案,给定一个生成器和一个批处理大小:

def pop_n_elems_from_generator(g, n):elems = []try:for idx in xrange(0, n):elems.append(g.next())return elemsexcept StopIteration:return elems

在这一点上,我认为我们需要一个递归生成器,以防万一…

在python 2中:

def chunks(li, n):if li == []:returnyield li[:n]for e in chunks(li[n:], n):yield e

在python 3中:

def chunks(li, n):if li == []:returnyield li[:n]yield from chunks(li[n:], n)

此外,在大规模外星人入侵的情况下,装饰递归生成器可能会变得方便:

def dec(gen):def new_gen(li, n):for e in gen(li, n):if e == []:returnyield ereturn new_gen
@decdef chunks(li, n):yield li[:n]for e in chunks(li[n:], n):yield e

在这一点上,我认为我们需要强制性的匿名递归函数。

Y = lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args)))chunks = Y(lambda f: lambda n: [n[0][:n[1]]] + f((n[0][n[1]:], n[1])) if len(n[0]) > 0 else [])
[AA[i:i+SS] for i in range(len(AA))[::SS]]

其中AA是数组,SS是块大小。例如:

>>> AA=range(10,21);SS=3>>> [AA[i:i+SS] for i in range(len(AA))[::SS]][[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]# or [range(10, 13), range(13, 16), range(16, 19), range(19, 21)] in py3

要扩展py3中的范围do

(py3) >>> [list(AA[i:i+SS]) for i in range(len(AA))[::SS]][[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]

根据这个答案,投票最多的答案在最后留下一个'runt'。这是我的解决方案,以真正获得尽可能均匀大小的块,没有runt。它基本上试图准确选择应该分割列表的小数点,但只是将其四舍五入到最近的整数:

from __future__ import division  # not needed in Python 3def n_even_chunks(l, n):"""Yield n as even chunks as possible from l."""last = 0for i in range(1, n+1):cur = int(round(i * (len(l) / n)))yield l[last:cur]last = cur

演示:

>>> pprint.pprint(list(n_even_chunks(list(range(100)), 9)))[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],[11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],[22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],[33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],[44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55],[56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66],[67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77],[78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88],[89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]>>> pprint.pprint(list(n_even_chunks(list(range(100)), 11)))[[0, 1, 2, 3, 4, 5, 6, 7, 8],[9, 10, 11, 12, 13, 14, 15, 16, 17],[18, 19, 20, 21, 22, 23, 24, 25, 26],[27, 28, 29, 30, 31, 32, 33, 34, 35],[36, 37, 38, 39, 40, 41, 42, 43, 44],[45, 46, 47, 48, 49, 50, 51, 52, 53, 54],[55, 56, 57, 58, 59, 60, 61, 62, 63],[64, 65, 66, 67, 68, 69, 70, 71, 72],[73, 74, 75, 76, 77, 78, 79, 80, 81],[82, 83, 84, 85, 86, 87, 88, 89, 90],[91, 92, 93, 94, 95, 96, 97, 98, 99]]

与投票最多的chunks答案相比:

>>> pprint.pprint(list(chunks(list(range(100)), 100//9)))[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],[11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],[22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],[33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],[44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54],[55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65],[66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76],[77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87],[88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98],[99]]>>> pprint.pprint(list(chunks(list(range(100)), 100//11)))[[0, 1, 2, 3, 4, 5, 6, 7, 8],[9, 10, 11, 12, 13, 14, 15, 16, 17],[18, 19, 20, 21, 22, 23, 24, 25, 26],[27, 28, 29, 30, 31, 32, 33, 34, 35],[36, 37, 38, 39, 40, 41, 42, 43, 44],[45, 46, 47, 48, 49, 50, 51, 52, 53],[54, 55, 56, 57, 58, 59, 60, 61, 62],[63, 64, 65, 66, 67, 68, 69, 70, 71],[72, 73, 74, 75, 76, 77, 78, 79, 80],[81, 82, 83, 84, 85, 86, 87, 88, 89],[90, 91, 92, 93, 94, 95, 96, 97, 98],[99]]

因为这里的每个人都在谈论迭代器。#0对此有完美的方法,称为#1

from boltons import iterutils
list(iterutils.chunked_iter(list(range(50)), 11))

输出:

[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],[11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],[22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],[33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],[44, 45, 46, 47, 48, 49]]

但是如果你不想在内存上手下留情,你可以使用旧的方式,首先用#1存储完整的list

您可以使用numpy的array_split函数,例如,np.array_split(np.array(data), 20)拆分为20个几乎相等大小的块。

要确保块的大小完全相等,请使用np.split

我在下面有一个确实有效的解决方案,但比该解决方案更重要的是对其他方法的一些评论。首先,一个好的解决方案不应该要求按顺序遍历子迭代器。如果我运行

g = paged_iter(list(range(50)), 11))i0 = next(g)i1 = next(g)list(i1)list(i0)

最后一个命令的适当输出是

 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

没有

 []

这不仅仅是按顺序访问迭代器的通常无聊限制。想象一下,一个消费者试图清理输入不佳的数据,这些数据颠倒了5个块的适当顺序,即数据看起来像[B5, A5, D5, C5],应该看起来像[A5, B5, C5, D5](其中A5只是五个元素而不是子列表)。这个消费者会查看分组函数声称的行为,并毫不犹豫地编写一个类似于

i = 0out = []for it in paged_iter(data,5)if (i % 2 == 0):swapped = itelse:out += list(it)out += list(swapped)i = i + 1

如果您偷偷地假设子迭代器总是按顺序完全使用,这将产生神秘的错误结果。如果您想从块中交错元素,情况会更糟。

其次,相当数量的建议解决方案隐含地依赖于迭代器具有确定性顺序的事实(例如它们没有设置),虽然使用islice的一些解决方案可能是可以的,但这让我担心。

第三,迭代工具分组方法有效,但配方依赖于zip_longest(或zip)函数的内部行为,这些行为不属于其发布的行为。特别是,分组函数有效是因为在zip_longest(i0… in)中,下一个函数总是在重新开始之前按照next(i0)、next(i1)、… next(in)的顺序调用。当Grouper传递同一个迭代器对象的n个副本时,它依赖于这种行为。

最后,如果你做出上面批评的假设,即子迭代器是按顺序访问并完全仔细阅读的,而没有这个假设,下面的解决方案可以得到改进,但必须隐式(通过调用链)或显式(通过deques或其他数据结构)将每个子迭代器的元素存储在某个地方。所以不要浪费时间(像我一样)假设可以用一些聪明的技巧解决这个问题。

def paged_iter(iterat, n):itr = iter(iterat)deq = Nonetry:while(True):deq = collections.deque(maxlen=n)for q in range(n):deq.append(next(itr))yield (i for i in deq)except StopIteration:yield (i for i in deq)

您也可以使用#1库的#0函数作为:

>>> from utilspie import iterutils>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(iterutils.get_chunks(a, 5))[[1, 2, 3, 4, 5], [6, 7, 8, 9]]

您可以通过pip安装#0

sudo pip install utilspie

免责声明:我是utilspie库的创建者

这里有一个使用itertools.groupby的想法:

def chunks(l, n):c = itertools.count()return (it for _, it in itertools.groupby(l, lambda x: next(c)//n))

这将返回生成器的生成器。如果您想要列表列表,只需将最后一行替换为

    return [list(it) for _, it in itertools.groupby(l, lambda x: next(c)//n)]

示例返回列表列表:

>>> chunks('abcdefghij', 4)[['a', 'b', 'c', 'd'], ['e', 'f', 'g', 'h'], ['i', 'j']]

(所以是的,这会产生“runt问题”,在特定情况下可能是也可能不是问题。

还有一个解决方案

def make_chunks(data, chunk_size):while data:chunk, data = data[:chunk_size], data[chunk_size:]yield chunk
>>> for chunk in make_chunks([1, 2, 3, 4, 5, 6, 7], 2):...     print chunk...[1, 2][3, 4][5, 6][7]>>>

这适用于v2/v3,可内联,基于生成器,仅使用标准库:

import itertoolsdef split_groups(iter_in, group_size):return ((x for _, x in item) for _, item in itertools.groupby(enumerate(iter_in), key=lambda x: x[0] // group_size))

没有魔法,但简单而正确:

def chunks(iterable, n):"""Yield successive n-sized chunks from iterable."""values = []for i, item in enumerate(iterable, 1):values.append(item)if i % n == 0:yield valuesvalues = []if values:yield values

我不认为我看到了这个选项,所以只是添加另一个:)):

def chunks(iterable, chunk_size):i = 0;while i < len(iterable):yield iterable[i:i+chunk_size]i += chunk_size

我对不同方法的性能很好奇,这里是:

在Python 3.5.1上测试

import timebatch_size = 7arr_len = 298937
#---------slice-------------
print("\r\nslice")start = time.time()arr = [i for i in range(0, arr_len)]while True:if not arr:break
tmp = arr[0:batch_size]arr = arr[batch_size:-1]print(time.time() - start)
#-----------index-----------
print("\r\nindex")arr = [i for i in range(0, arr_len)]start = time.time()for i in range(0, round(len(arr) / batch_size + 1)):tmp = arr[batch_size * i : batch_size * (i + 1)]print(time.time() - start)
#----------batches 1------------
def batch(iterable, n=1):l = len(iterable)for ndx in range(0, l, n):yield iterable[ndx:min(ndx + n, l)]
print("\r\nbatches 1")arr = [i for i in range(0, arr_len)]start = time.time()for x in batch(arr, batch_size):tmp = xprint(time.time() - start)
#----------batches 2------------
from itertools import islice, chain
def batch(iterable, size):sourceiter = iter(iterable)while True:batchiter = islice(sourceiter, size)yield chain([next(batchiter)], batchiter)

print("\r\nbatches 2")arr = [i for i in range(0, arr_len)]start = time.time()for x in batch(arr, batch_size):tmp = xprint(time.time() - start)
#---------chunks-------------def chunks(l, n):"""Yield successive n-sized chunks from l."""for i in range(0, len(l), n):yield l[i:i + n]print("\r\nchunks")arr = [i for i in range(0, arr_len)]start = time.time()for x in chunks(arr, batch_size):tmp = xprint(time.time() - start)
#-----------grouper-----------
from itertools import zip_longest # for Python 3.x#from six.moves import zip_longest # for both (uses the six compat library)
def grouper(iterable, n, padvalue=None):"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
arr = [i for i in range(0, arr_len)]print("\r\ngrouper")start = time.time()for x in grouper(arr, batch_size):tmp = xprint(time.time() - start)

结果:

slice31.18285083770752
index0.02184295654296875
batches 10.03503894805908203
batches 20.22681021690368652
chunks0.019841909408569336
grouper0.006506919860839844

我不喜欢按块大小拆分元素的想法,例如脚本可以将101到3个块作为[50,50,1]。为了我的需要,我需要按比例拆分,并保持顺序相同。首先,我写了自己的脚本,效果很好,而且非常简单。但是我后来看到了这个答案,那里的脚本比我的更好,我推荐它。这是我的脚本:

def proportional_dividing(N, n):"""N - length of array (bigger number)n - number of chunks (smaller number)output - arr, containing N numbers, diveded roundly to n chunks"""arr = []if N == 0:return arrelif n == 0:arr.append(N)return arrr = N // nfor i in range(n-1):arr.append(r)arr.append(N-r*(n-1))
last_n = arr[-1]# last number always will be r <= last_n < 2*r# when last_n == r it's ok, but when last_n > r ...if last_n > r:# ... and if difference too big (bigger than 1), thenif abs(r-last_n) > 1:#[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7] # N=29, n=12# we need to give unnecessary numbers to first elements backdiff = last_n - rfor k in range(diff):arr[k] += 1arr[-1] = r# and we receive [3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2]return arr
def split_items(items, chunks):arr = proportional_dividing(len(items), chunks)splitted = []for chunk_size in arr:splitted.append(items[:chunk_size])items = items[chunk_size:]print(splitted)return splitted
items = [1,2,3,4,5,6,7,8,9,10,11]chunks = 3split_items(items, chunks)split_items(['a','b','c','d','e','f','g','h','i','g','k','l', 'm'], 3)split_items(['a','b','c','d','e','f','g','h','i','g','k','l', 'm', 'n'], 3)split_items(range(100), 4)split_items(range(99), 4)split_items(range(101), 4)

和输出:

[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11]][['a', 'b', 'c', 'd'], ['e', 'f', 'g', 'h'], ['i', 'g', 'k', 'l', 'm']][['a', 'b', 'c', 'd', 'e'], ['f', 'g', 'h', 'i', 'g'], ['k', 'l', 'm', 'n']][range(0, 25), range(25, 50), range(50, 75), range(75, 100)][range(0, 25), range(25, 50), range(50, 75), range(75, 99)][range(0, 25), range(25, 50), range(50, 75), range(75, 101)]

不要重新发明轮子。

UPDATE:即将到来的Python 3.12引入了itertools.batched,它最终解决了这个问题。见下文。

鉴于

import itertools as itimport collections as ct
import more_itertools as mit

iterable = range(11)n = 3

代码

#0++

list(it.batched(iterable, n))# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

#0+

list(mit.chunked(iterable, n))# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
list(mit.sliced(iterable, n))# [range(0, 3), range(3, 6), range(6, 9), range(9, 11)]
list(mit.grouper(n, iterable))# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]
list(mit.windowed(iterable, len(iterable)//n, step=n))# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]
list(mit.chunked_even(iterable, n))# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

(或DIY,如果你愿意)

标准图书馆

list(it.zip_longest(*[iter(iterable)] * n))# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]
d = {}for i, x in enumerate(iterable):d.setdefault(i//n, []).append(x)    

list(d.values())# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
dd = ct.defaultdict(list)for i, x in enumerate(iterable):dd[i//n].append(x)    

list(dd.values())# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

参考文献

+一个第三方库,实现迭代工具食谱和更多。> pip install more_itertools

++包含在Python标准库3.12+中。batched类似于more_itertools.chunked

延迟加载版本

import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[range(10, 20),
range(20, 30),
range(30, 40),
range(40, 50),
range(50, 60),
range(60, 70),
range(70, 75)]

将此实现的结果与 接受的答案的示例使用结果一起授予。

上面的许多函数都假设整个迭代的长度是预先知道的,或者至少计算起来很便宜。

对于一些流对象,这意味着首先将完整的数据加载到内存中(例如下载整个文件) ,以获得长度信息。

如果你还不知道全尺寸,你可以使用下面的代码:

def chunks(iterable, size):
"""
Yield successive chunks from iterable, being `size` long.


https://stackoverflow.com/a/55776536/3423324
:param iterable: The object you want to split into pieces.
:param size: The size each of the resulting pieces should have.
"""
i = 0
while True:
sliced = iterable[i:i + size]
if len(sliced) == 0:
# to suppress stuff like `range(max, max)`.
break
# end if
yield sliced
if len(sliced) < size:
# our slice is not the full length, so we must have passed the end of the iterator
break
# end if
i += size  # so we start the next chunk at the right place.
# end while
# end def

这样做的原因是,如果传递一个迭代器的结尾,slice 命令将返回 less/no 元素:

"abc"[0:2] == 'ab'
"abc"[2:4] == 'c'
"abc"[4:6] == ''

我们现在使用切片的结果,并计算生成的块的长度。如果它比我们期望的少,我们知道我们可以结束迭代。

这样,除非访问,否则迭代器将不会被执行。

如果你不在乎顺序:

> from itertools import groupby
> batch_no = 3
> data = 'abcdefgh'


> [
[x[1] for x in x[1]]
for x in
groupby(
sorted(
(x[0] % batch_no, x[1])
for x in
enumerate(data)
),
key=lambda x: x[0]
)
]


[['a', 'd', 'g'], ['b', 'e', 'h'], ['c', 'f']]


这个解决方案不生成相同大小的集合,而是分发值,以便在保持生成的批数的同时尽可能地大。

Pythonpydash包可能是一个不错的选择。

from pydash.arrays import chunk
ids = ['22', '89', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '1']
chunk_ids = chunk(ids,5)
print(chunk_ids)
# output: [['22', '89', '2', '3', '4'], ['5', '6', '7', '8', '9'], ['10', '11', '1']]

为更多的检查 虚拟数据块列表

这个问题让我想起了 Raku (以前的 Perl 6) .comb(n)方法。它将字符串分解成 n大小的块。(还有更多内容,但我会略去细节。)

在 Python 3中实现一个类似的函数作为 lambda 表达式很容易:

comb = lambda s,n: (s[i:i+n] for i in range(0,len(s),n))

然后你可以这样称呼它:

some_list = list(range(0, 20))  # creates a list of 20 elements
generator = comb(some_list, 4)  # creates a generator that will generate lists of 4 elements
for sublist in generator:
print(sublist)  # prints a sublist of four elements, as it's generated

当然,您不必将生成器分配给变量; 您可以像下面这样直接遍历它:

for sublist in comb(some_list, 4):
print(sublist)  # prints a sublist of four elements, as it's generated

另外,这个 comb()函数还可以操作字符串:

list( comb('catdogant', 3) )  # returns ['cat', 'dog', 'ant']

一种不需要迭代工具,但仍然适用于任意生成器的老式方法:

def chunks(g, n):
"""divide a generator 'g' into small chunks
Yields:
a chunk that has 'n' or less items
"""
n = max(1, n)
buff = []
for item in g:
buff.append(item)
if len(buff) == n:
yield buff
buff = []
if buff:
yield buff

使用 Python 3.8中的 作业表达式,它变得相当不错:

import itertools


def batch(iterable, size):
it = iter(iterable)
while item := list(itertools.islice(it, size)):
yield item

这适用于任意的迭代,而不仅仅是一个列表。

>>> import pprint
>>> pprint.pprint(list(batch(range(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]

更新

从 Python 3.12开始,这个确切的实现可以作为 Itertools 批处理使用

def main():
print(chunkify([1,2,3,4,5,6],2))


def chunkify(list, n):
chunks = []
for i in range(0, len(list), n):
chunks.append(list[i:i+n])
return chunks


main()

我认为它很简单,可以给你一个数组块。

任何可迭代文件的通用块,它使用户可以选择如何在结尾处理部分块。

在 Python 3上测试。

chunker.py

from enum import Enum


class PartialChunkOptions(Enum):
INCLUDE = 0
EXCLUDE = 1
PAD = 2
ERROR = 3


class PartialChunkException(Exception):
pass


def chunker(iterable, n, on_partial=PartialChunkOptions.INCLUDE, pad=None):
"""
A chunker yielding n-element lists from an iterable, with various options
about what to do about a partial chunk at the end.


on_partial=PartialChunkOptions.INCLUDE (the default):
include the partial chunk as a short (<n) element list


on_partial=PartialChunkOptions.EXCLUDE
do not include the partial chunk


on_partial=PartialChunkOptions.PAD
pad to an n-element list
(also pass pad=<pad_value>, default None)


on_partial=PartialChunkOptions.ERROR
raise a RuntimeError if a partial chunk is encountered
"""


on_partial = PartialChunkOptions(on_partial)


iterator = iter(iterable)
while True:
vals = []
for i in range(n):
try:
vals.append(next(iterator))
except StopIteration:
if vals:
if on_partial == PartialChunkOptions.INCLUDE:
yield vals
elif on_partial == PartialChunkOptions.EXCLUDE:
pass
elif on_partial == PartialChunkOptions.PAD:
yield vals + [pad] * (n - len(vals))
elif on_partial == PartialChunkOptions.ERROR:
raise PartialChunkException
return
return
yield vals

test.py

import chunker


chunk_size = 3


for it in (range(100, 107),
range(100, 109)):


print("\nITERABLE TO CHUNK: {}".format(it))
print("CHUNK SIZE: {}".format(chunk_size))


for option in chunker.PartialChunkOptions.__members__.values():
print("\noption {} used".format(option))
try:
for chunk in chunker.chunker(it, chunk_size, on_partial=option):
print(chunk)
except chunker.PartialChunkException:
print("PartialChunkException was raised")
print("")

test.py输出


ITERABLE TO CHUNK: range(100, 107)
CHUNK SIZE: 3


option PartialChunkOptions.INCLUDE used
[100, 101, 102]
[103, 104, 105]
[106]


option PartialChunkOptions.EXCLUDE used
[100, 101, 102]
[103, 104, 105]


option PartialChunkOptions.PAD used
[100, 101, 102]
[103, 104, 105]
[106, None, None]


option PartialChunkOptions.ERROR used
[100, 101, 102]
[103, 104, 105]
PartialChunkException was raised




ITERABLE TO CHUNK: range(100, 109)
CHUNK SIZE: 3


option PartialChunkOptions.INCLUDE used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]


option PartialChunkOptions.EXCLUDE used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]


option PartialChunkOptions.PAD used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]


option PartialChunkOptions.ERROR used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]


抽象就是

l = [1,2,3,4,5,6,7,8,9]
n = 3
outList = []
for i in range(n, len(l) + n, n):
outList.append(l[i-n:i])


print(outList)

这将印刷:

[[1,2,3] ,[4,5,6] ,[7,8,9]

我已经创建了这两个奇特的一行程序,它们既高效又懒惰,输入和输出都是可迭代的,而且它们不依赖于任何模块:

第一个一行程序是完全懒惰的,意思是它返回迭代器生成迭代器(即每个生成的块都是迭代器迭代块的元素) ,这个版本适用于块非常大或者元素一个接一个缓慢生成并且应该在生成时立即可用的情况:

上网试试!

chunk_iters = lambda it, n: ((e for i, g in enumerate(((f,), cit)) for j, e in zip(range((1, n - 1)[i]), g)) for cit in (iter(it),) for f in cit)

第二个一行程序返回生成列表的迭代器。一旦整个块的元素通过输入迭代器变得可用,或者达到最后一个块的最后一个元素,就会生成每个列表。如果快速生成输入元素或者所有输入元素立即可用,则应使用此版本。否则应该使用第一个更懒惰的一行程序版本。

上网试试!

chunk_lists = lambda it, n: (l for l in ([],) for i, g in enumerate((it, ((),))) for e in g for l in (l[:len(l) % n] + [e][:1 - i],) if (len(l) % n == 0) != i)

此外,我还提供了第一个 chunk_iters一行程序的多行版本,它返回生成另一个迭代器的迭代器(遍历每个块的元素) :

上网试试!

def chunk_iters(it, n):
cit = iter(it)
def one_chunk(f):
yield f
for i, e in zip(range(n - 1), cit):
yield e
for f in cit:
yield one_chunk(f)

尽管有很多答案,我有一个非常简单的方法:


x = list(range(10, 75))
indices = x[0::10]
print("indices: ", indices)
xx = [x[i-10:i] for i in indices ]
print("x= ", x)
print ("xx= ",xx)


结果将是:

指数: [10,20,30,40,50,60,70] x = [10,11,12,13,14,15, 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32, 33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49, 50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66, 67,68,69,70,71,72,73,74]

Xx = [[10,11,12,13,14,15,16,17,18,19] ,
[20,21,22,23,24,25,26,27,28,29],
[30,31,32,33,34,35,36,37,38,39],
[40,41,42,43,44,45,46,47,48,49],
[50,51,52,53,54,55,56,57,58,59],
[60,61,62,63,64,65,66,67,68,69],
[70,71,72,73,74]

很简单溶液

OP 要求“相同大小的块”。我理解“相同大小”为“平衡”的大小: 我们正在寻找的项目组的 差不多相同的大小 如果相同的大小是不可能的(例如,23/5)。

这里的输入是:

  • 项目列表: input_list(例如23个数字的列表)
  • 拆分这些项目的组数: n_groups(例如,5)

输入:

input_list = list(range(23))
n_groups = 5

相邻元素组:

approx_sizes = len(input_list)/n_groups


groups_cont = [input_list[int(i*approx_sizes):int((i+1)*approx_sizes)]
for i in range(n_groups)]

“每 N”元素组:

groups_leap = [input_list[i::n_groups]
for i in range(n_groups)]

结果

print(len(input_list))


print('Contiguous elements lists:')
print(groups_cont)


print('Leap every "N" items lists:')
print(groups_leap)

将输出:

23


Contiguous elements lists:
[[0, 1, 2, 3], [4, 5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16, 17], [18, 19, 20, 21, 22]]


Leap every "N" items lists:
[[0, 5, 10, 15, 20], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18], [4, 9, 14, 19]]
from itertools import islice
l=[1,2,3,4,5,6]
chuncksize=input("Enter chunk size")
m=[]
obj=iter(l)
m.append(list(islice(l,3)))
m.append(list(islice(l,3)))
print(m)

使用 接受的答案中的生成器可以很容易地完成这项任务。我正在添加实现 length 方法的类实现,这对某些人可能很有用。我需要知道进度(使用 tqdm) ,这样生成器应该已经返回了块的数量。

class ChunksIterator(object):
def __init__(self, data, n):
self._data = data
self._l = len(data)
self._n = n


def __iter__(self):
for i in range(0, self._l, self._n):
yield self._data[i:i + self._n]


def __len__(self):
rem = 1 if self._l % self._n != 0 else 0
return self._l // self._n + rem

用法:

it = ChunksIterator([1,2,3,4,5,6,7,8,9], 2)
print(len(it))
for i in it:
print(i)

森德勒的回答的一行程序版本:

from itertools import islice
from functools import partial


seq = [1,2,3,4,5,6,7]
size = 3
result = list(iter(partial(lambda it: tuple(islice(it, size)), iter(seq)), ()))
assert result == [(1, 2, 3), (4, 5, 6), (7,)]

假设名单是 lst

import math


# length of the list len(lst) is ln
# size of a chunk is size


for num in range ( math.ceil(ln/size) ):
start, end = num*size, min((num+1)*size, ln)
print(lst[start:end])

User@tzot 的 解决方案 zip_longest(*[iter(lst)]*n, fillvalue=padvalue)非常优雅,但是如果 lst的长度不能被 n整除,它会填充最后一个子列表,以保持其长度与其他子列表的长度匹配。然而,如果这不可取,那么简单地使用 zip()来生成类似的循环 zip,并将 lst的其余元素(不能生成一个“完整的”子列表)附加到输出中就可以了。

list(map(list, zip(*[iter(lst)]*n))) + ([rest] if (rest:=lst[len(lst)//n*n : ]) else [])

上面的一行程序可能在函数中更具可读性。与这里的其他函数不同,它生成一个列表,而不是一个生成器。根据用例的不同,这可能是可取的,也可能是不可取的。

def chunkify(lst, chunk_size):
nested = list(map(list, zip(*[iter(lst)]*chunk_size)))
rest = lst[len(lst)//chunk_size*chunk_size: ]
if rest:
nested.append(rest)
return nested

它比这里的一些最流行的答案更快,它们产生相同的输出。

my_list, n = list(range(1_000_000)), 12


%timeit list(chunks(my_list, n))                                         # @Ned_Batchelder
# 36.4 ms ± 1.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


%timeit [my_list[i:i+n] for i in range(0, len(my_list), n)]              # @Ned_Batchelder
# 34.6 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


%timeit it = iter(my_list); list(iter(lambda: list(islice(it, n)), []))  # @senderle
# 60.6 ms ± 5.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


%timeit list(mit.chunked(my_list, n))                                    # @pylang
# 59.4 ms ± 4.92 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


%timeit chunkify(my_list, n)
# 25.8 ms ± 1.84 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

你可以使用 more_itertools.chunked_evenmath.ceil。可能是最容易推理的?

from math import ceil
import more_itertools as mit
from pprint import pprint


pprint([*mit.chunked_even(range(19), ceil(19 / 5))])
# [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18]]


pprint([*mit.chunked_even(range(20), ceil(20 / 5))])
# [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19]]


pprint([*mit.chunked_even(range(21), ceil(21 / 5))])
# [[0, 1, 2, 3, 4],
# [5, 6, 7, 8],
# [9, 10, 11, 12],
# [13, 14, 15, 16],
# [17, 18, 19, 20]]


pprint([*mit.chunked_even(range(3), ceil(3 / 5))])
# [[0], [1], [2]]




Itertools 模块中的配方提供了两种方法来实现这一点,具体取决于您希望如何处理最终的奇数批量(保留它,用一个填充值填充它,忽略它,或引发异常) :

from itertools import islice, izip_longest


def batched(iterable, n):
"Batch data into lists of length n. The last batch may be shorter."
# batched('ABCDEFG', 3) --> ABC DEF G
it = iter(iterable)
while True:
batch = list(islice(it, n))
if not batch:
return
yield batch


def grouper(iterable, n, *, incomplete='fill', fillvalue=None):
"Collect data into non-overlapping fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, fillvalue='x') --> ABC DEF Gxx
# grouper('ABCDEFG', 3, incomplete='strict') --> ABC DEF ValueError
# grouper('ABCDEFG', 3, incomplete='ignore') --> ABC DEF
args = [iter(iterable)] * n
if incomplete == 'fill':
return zip_longest(*args, fillvalue=fillvalue)
if incomplete == 'strict':
return zip(*args, strict=True)
if incomplete == 'ignore':
return zip(*args)
else:
raise ValueError('Expected fill, strict, or ignore')

您应该使用 itertools

a = [1, 2, 3, 4]
for i, k in more_itertools.pairwise(a):
result += compute(i,k)

这将对列表中每两个相应的元素调用函数计算