在Python中扁平化一个浅列表

是否有一种简单的方法可以用列表理解来扁平化一个可迭代对象列表,或者如果没有,你们都认为什么是扁平化这样一个浅列表的最好方法,平衡性能和可读性?

我尝试用一个嵌套的列表理解来扁平化这样一个列表,就像这样:

[image for image in menuitem for menuitem in list_of_menuitems]

但我在那里遇到了NameError类型的麻烦,因为name 'menuitem' is not defined。在google和Stack Overflow上查找之后,我用reduce语句得到了想要的结果:

reduce(list.__add__, map(lambda x: list(x), list_of_menuitems))

但这个方法是相当不可读的,因为我需要list(x)调用,因为x是Django的QuerySet对象。

结论:

感谢每个为这个问题做出贡献的人。以下是我所学到的一份总结。我也把它变成了一个社区维基,以防其他人想要添加或纠正这些观察。

我原来的reduce语句是多余的,最好这样写:

>>> reduce(list.__add__, (list(mi) for mi in list_of_menuitems))

这是嵌套列表推导式的正确语法(Brilliant summary dF!):

>>> [image for mi in list_of_menuitems for image in mi]

但是这两个方法都没有使用itertools.chain有效:

>>> from itertools import chain
>>> list(chain(*list_of_menuitems))

正如@cdleary指出的那样,使用chain.from_iterable来避免*操作符魔法可能是更好的风格:

>>> chain = itertools.chain.from_iterable([[1,2],[3],[5,89],[],[6]])
>>> print(list(chain))
>>> [1, 2, 3, 5, 89, 6]
210524 次浏览

在我的脑海中,你可以消去lambda

reduce(list.__add__, map(list, [mi.image_set.all() for mi in list_of_menuitems]))

或者甚至删除地图,因为你已经有了一个列表:

reduce(list.__add__, [list(mi.image_set.all()) for mi in list_of_menuitems])

你也可以将它表示为列表的和:

sum([list(mi.image_set.all()) for mi in list_of_menuitems], [])

是什么:

from operator import add
reduce(add, map(lambda x: list(x.image_set.all()), [mi for mi in list_of_menuitems]))

但是,Guido建议不要在一行代码中执行太多操作,因为这会降低可读性。在单行中执行与在多行中执行您想要的操作相比,性能收益最小(如果有的话)。

如果你只是想迭代一个扁平版本的数据结构,不需要一个可索引序列,考虑itertools。连锁店和公司

>>> list_of_menuitems = [['image00', 'image01'], ['image10'], []]
>>> import itertools
>>> chain = itertools.chain(*list_of_menuitems)
>>> print(list(chain))
['image00', 'image01', 'image10']

它将适用于任何可迭代的对象,其中应该包括Django的可迭代对象QuerySets,它似乎是你在问题中使用的对象。

编辑:无论如何,这可能和reduce一样好,因为reduce将有相同的开销将项复制到正在扩展的列表中。如果你在最后运行list(chain)chain只会引起这个(相同的)开销。

Meta-Edit:实际上,它的开销比问题的建议解决方案要少,因为当你用临时扩展原始列表时,你扔掉了你创建的临时列表。

编辑:由于J.F.塞巴斯蒂安说 itertools.chain.from_iterable避免了解包,你应该使用它来避免*魔法,但是timeit应用显示出可以忽略不计的性能差异。

下面是使用列表推导式的正确解决方案(它们在这个问题中是落后的):

>>> join = lambda it: (y for x in it for y in x)
>>> list(join([[1,2],[3,4,5],[]]))
[1, 2, 3, 4, 5]

对你来说就是这样

[image for menuitem in list_of_menuitems for image in menuitem.image_set.all()]

或者你可以使用join并说

join(menuitem.image_set.all() for menuitem in list_of_menuitems)

在这两种情况下,问题在于for循环的嵌套。

你就快成功了!做嵌套列表推导的方法是将for语句放入与常规嵌套for语句相同的顺序。

因此,这

for inner_list in outer_list:
for item in inner_list:
...

对应于

[... for inner_list in outer_list for item in inner_list]

所以你想

[image for menuitem in list_of_menuitems for image in menuitem]

性能结果。修改。

import itertools
def itertools_flatten( aList ):
return list( itertools.chain(*aList) )


from operator import add
def reduce_flatten1( aList ):
return reduce(add, map(lambda x: list(x), [mi for mi in aList]))


def reduce_flatten2( aList ):
return reduce(list.__add__, map(list, aList))


def comprehension_flatten( aList ):
return list(y for x in aList for y in x)

我将一个包含30个道具的2级列表平铺了1000次

itertools_flatten     0.00554
comprehension_flatten 0.00815
reduce_flatten2       0.01103
reduce_flatten1       0.01404

减少总是一个糟糕的选择。

这个解决方案适用于任意的嵌套深度——而不仅仅是一些(所有?)其他解决方案所限制的“列表的列表”深度:

def flatten(x):
result = []
for el in x:
if hasattr(el, "__iter__") and not isinstance(el, basestring):
result.extend(flatten(el))
else:
result.append(el)
return result

它是递归,允许任意深度嵌套-直到你达到最大递归深度,当然…

你启发了我写一个timeit应用程序。

我认为它也会根据分区的数量(容器列表中的迭代器的数量)而变化——你的评论没有提到这30个项目中有多少个分区。这个图在每次运行中平摊1000个项目,使用不同数量的分区。这些物品均匀地分布在各个分区中。

扁平化对比

代码(Python 2.6):

#!/usr/bin/env python2.6


"""Usage: %prog item_count"""


from __future__ import print_function


import collections
import itertools
import operator
from timeit import Timer
import sys


import matplotlib.pyplot as pyplot


def itertools_flatten(iter_lst):
return list(itertools.chain(*iter_lst))


def itertools_iterable_flatten(iter_iter):
return list(itertools.chain.from_iterable(iter_iter))


def reduce_flatten(iter_lst):
return reduce(operator.add, map(list, iter_lst))


def reduce_lambda_flatten(iter_lst):
return reduce(operator.add, map(lambda x: list(x), [i for i in iter_lst]))


def comprehension_flatten(iter_lst):
return list(item for iter_ in iter_lst for item in iter_)


METHODS = ['itertools', 'itertools_iterable', 'reduce', 'reduce_lambda',
'comprehension']


def _time_test_assert(iter_lst):
"""Make sure all methods produce an equivalent value.
:raise AssertionError: On any non-equivalent value."""
callables = (globals()[method + '_flatten'] for method in METHODS)
results = [callable(iter_lst) for callable in callables]
if not all(result == results[0] for result in results[1:]):
raise AssertionError


def time_test(partition_count, item_count_per_partition, test_count=10000):
"""Run flatten methods on a list of :param:`partition_count` iterables.
Normalize results over :param:`test_count` runs.
:return: Mapping from method to (normalized) microseconds per pass.
"""
iter_lst = [[dict()] * item_count_per_partition] * partition_count
print('Partition count:    ', partition_count)
print('Items per partition:', item_count_per_partition)
_time_test_assert(iter_lst)
test_str = 'flatten(%r)' % iter_lst
result_by_method = {}
for method in METHODS:
setup_str = 'from test import %s_flatten as flatten' % method
t = Timer(test_str, setup_str)
per_pass = test_count * t.timeit(number=test_count) / test_count
print('%20s: %.2f usec/pass' % (method, per_pass))
result_by_method[method] = per_pass
return result_by_method


if __name__ == '__main__':
if len(sys.argv) != 2:
raise ValueError('Need a number of items to flatten')
item_count = int(sys.argv[1])
partition_counts = []
pass_times_by_method = collections.defaultdict(list)
for partition_count in xrange(1, item_count):
if item_count % partition_count != 0:
continue
items_per_partition = item_count / partition_count
result_by_method = time_test(partition_count, items_per_partition)
partition_counts.append(partition_count)
for method, result in result_by_method.iteritems():
pass_times_by_method[method].append(result)
for method, pass_times in pass_times_by_method.iteritems():
pyplot.plot(partition_counts, pass_times, label=method)
pyplot.legend()
pyplot.title('Flattening Comparison for %d Items' % item_count)
pyplot.xlabel('Number of Partitions')
pyplot.ylabel('Microseconds')
pyplot.show()

编辑:决定让它成为社区维基。

注意: METHODS可能应该用装饰器来累积,但我认为这样更容易让人们阅读。

在Python 2.6中,使用chain.from_iterable():

>>> from itertools import chain
>>> list(chain.from_iterable(mi.image_set.all() for mi in h.get_image_menu()))

避免了中间列表的创建。

Python 3.4中,你将能够做到:

[*innerlist for innerlist in outer_list]

你试过flatten吗? 从matplotlib.cbook。平(seq, scalarp =) ?

l=[[1,2,3],[4,5,6], [7], [8,9]]*33


run("list(flatten(l))")
3732 function calls (3303 primitive calls) in 0.007 seconds


Ordered by: standard name


ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1    0.000    0.000    0.007    0.007 <string>:1(<module>)
429    0.001    0.000    0.001    0.000 cbook.py:475(iterable)
429    0.002    0.000    0.003    0.000 cbook.py:484(is_string_like)
429    0.002    0.000    0.006    0.000 cbook.py:565(is_scalar_or_string)
727/298    0.001    0.000    0.007    0.000 cbook.py:605(flatten)
429    0.000    0.000    0.001    0.000 core.py:5641(isMaskedArray)
858    0.001    0.000    0.001    0.000 {isinstance}
429    0.000    0.000    0.000    0.000 {iter}
1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}






l=[[1,2,3],[4,5,6], [7], [8,9]]*66


run("list(flatten(l))")
7461 function calls (6603 primitive calls) in 0.007 seconds


Ordered by: standard name


ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1    0.000    0.000    0.007    0.007 <string>:1(<module>)
858    0.001    0.000    0.001    0.000 cbook.py:475(iterable)
858    0.002    0.000    0.003    0.000 cbook.py:484(is_string_like)
858    0.002    0.000    0.006    0.000 cbook.py:565(is_scalar_or_string)
1453/595    0.001    0.000    0.007    0.000 cbook.py:605(flatten)
858    0.000    0.000    0.001    0.000 core.py:5641(isMaskedArray)
1716    0.001    0.000    0.001    0.000 {isinstance}
858    0.000    0.000    0.000    0.000 {iter}
1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}






l=[[1,2,3],[4,5,6], [7], [8,9]]*99


run("list(flatten(l))")
11190 function calls (9903 primitive calls) in 0.010 seconds


Ordered by: standard name


ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1    0.000    0.000    0.010    0.010 <string>:1(<module>)
1287    0.002    0.000    0.002    0.000 cbook.py:475(iterable)
1287    0.003    0.000    0.004    0.000 cbook.py:484(is_string_like)
1287    0.002    0.000    0.009    0.000 cbook.py:565(is_scalar_or_string)
2179/892    0.001    0.000    0.010    0.000 cbook.py:605(flatten)
1287    0.001    0.000    0.001    0.000 core.py:5641(isMaskedArray)
2574    0.001    0.000    0.001    0.000 {isinstance}
1287    0.000    0.000    0.000    0.000 {iter}
1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}






l=[[1,2,3],[4,5,6], [7], [8,9]]*132


run("list(flatten(l))")
14919 function calls (13203 primitive calls) in 0.013 seconds


Ordered by: standard name


ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1    0.000    0.000    0.013    0.013 <string>:1(<module>)
1716    0.002    0.000    0.002    0.000 cbook.py:475(iterable)
1716    0.004    0.000    0.006    0.000 cbook.py:484(is_string_like)
1716    0.003    0.000    0.011    0.000 cbook.py:565(is_scalar_or_string)
2905/1189    0.002    0.000    0.013    0.000 cbook.py:605(flatten)
1716    0.001    0.000    0.001    0.000 core.py:5641(isMaskedArray)
3432    0.001    0.000    0.001    0.000 {isinstance}
1716    0.001    0.000    0.001    0.000 {iter}
1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler'

<强>更新 这给了我另一个想法:

l=[[1,2,3],[4,5,6], [7], [8,9]]*33


run("flattenlist(l)")
564 function calls (432 primitive calls) in 0.000 seconds


Ordered by: standard name


ncalls  tottime  percall  cumtime  percall filename:lineno(function)
133/1    0.000    0.000    0.000    0.000 <ipython-input-55-39b139bad497>:4(flattenlist)
1    0.000    0.000    0.000    0.000 <string>:1(<module>)
429    0.000    0.000    0.000    0.000 {isinstance}
1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}






l=[[1,2,3],[4,5,6], [7], [8,9]]*66


run("flattenlist(l)")
1125 function calls (861 primitive calls) in 0.001 seconds


Ordered by: standard name


ncalls  tottime  percall  cumtime  percall filename:lineno(function)
265/1    0.001    0.000    0.001    0.001 <ipython-input-55-39b139bad497>:4(flattenlist)
1    0.000    0.000    0.001    0.001 <string>:1(<module>)
858    0.000    0.000    0.000    0.000 {isinstance}
1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}






l=[[1,2,3],[4,5,6], [7], [8,9]]*99


run("flattenlist(l)")
1686 function calls (1290 primitive calls) in 0.001 seconds


Ordered by: standard name


ncalls  tottime  percall  cumtime  percall filename:lineno(function)
397/1    0.001    0.000    0.001    0.001 <ipython-input-55-39b139bad497>:4(flattenlist)
1    0.000    0.000    0.001    0.001 <string>:1(<module>)
1287    0.000    0.000    0.000    0.000 {isinstance}
1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}






l=[[1,2,3],[4,5,6], [7], [8,9]]*132


run("flattenlist(l)")
2247 function calls (1719 primitive calls) in 0.002 seconds


Ordered by: standard name


ncalls  tottime  percall  cumtime  percall filename:lineno(function)
529/1    0.001    0.000    0.002    0.002 <ipython-input-55-39b139bad497>:4(flattenlist)
1    0.000    0.000    0.002    0.002 <string>:1(<module>)
1716    0.001    0.000    0.001    0.000 {isinstance}
1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}






l=[[1,2,3],[4,5,6], [7], [8,9]]*1320


run("flattenlist(l)")
22443 function calls (17163 primitive calls) in 0.016 seconds


Ordered by: standard name


ncalls  tottime  percall  cumtime  percall filename:lineno(function)
5281/1    0.011    0.000    0.016    0.016 <ipython-input-55-39b139bad497>:4(flattenlist)
1    0.000    0.000    0.016    0.016 <string>:1(<module>)
17160    0.005    0.000    0.005    0.000 {isinstance}
1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

因此,为了测试当递归深入时它的有效性:深入多少?

l=[[1,2,3],[4,5,6], [7], [8,9]]*1320


new=[l]*33


run("flattenlist(new)")
740589 function calls (566316 primitive calls) in 0.418 seconds


Ordered by: standard name


ncalls  tottime  percall  cumtime  percall filename:lineno(function)
174274/1    0.281    0.000    0.417    0.417 <ipython-input-55-39b139bad497>:4(flattenlist)
1    0.001    0.001    0.418    0.418 <string>:1(<module>)
566313    0.136    0.000    0.136    0.000 {isinstance}
1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}






new=[l]*66


run("flattenlist(new)")
1481175 function calls (1132629 primitive calls) in 0.809 seconds


Ordered by: standard name


ncalls  tottime  percall  cumtime  percall filename:lineno(function)
348547/1    0.542    0.000    0.807    0.807 <ipython-input-55-39b139bad497>:4(flattenlist)
1    0.002    0.002    0.809    0.809 <string>:1(<module>)
1132626    0.266    0.000    0.266    0.000 {isinstance}
1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}






new=[l]*99


run("flattenlist(new)")
2221761 function calls (1698942 primitive calls) in 1.211 seconds


Ordered by: standard name


ncalls  tottime  percall  cumtime  percall filename:lineno(function)
522820/1    0.815    0.000    1.208    1.208 <ipython-input-55-39b139bad497>:4(flattenlist)
1    0.002    0.002    1.211    1.211 <string>:1(<module>)
1698939    0.393    0.000    0.393    0.000 {isinstance}
1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}






new=[l]*132


run("flattenlist(new)")
2962347 function calls (2265255 primitive calls) in 1.630 seconds


Ordered by: standard name


ncalls  tottime  percall  cumtime  percall filename:lineno(function)
697093/1    1.091    0.000    1.627    1.627 <ipython-input-55-39b139bad497>:4(flattenlist)
1    0.003    0.003    1.630    1.630 <string>:1(<module>)
2265252    0.536    0.000    0.536    0.000 {isinstance}
1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}






new=[l]*1320


run("flattenlist(new)")
29623443 function calls (22652523 primitive calls) in 16.103 seconds


Ordered by: standard name


ncalls  tottime  percall  cumtime  percall filename:lineno(function)
6970921/1   10.842    0.000   16.069   16.069 <ipython-input-55-39b139bad497>:4(flattenlist)
1    0.034    0.034   16.103   16.103 <string>:1(<module>)
22652520    5.227    0.000    5.227    0.000 {isinstance}
1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

我打赌“扁平化”我将使用这个而不是matplobib很长很长一段时间,除非我想要一个yield生成器和快速的结果“扁平化”使用matplobib .cbook

这个,是快的。

  • 这是代码

typ=(list,tuple)




def flattenlist(d):
thelist = []
for x in d:
if not isinstance(x,typ):
thelist += [x]
else:
thelist += flattenlist(x)
return thelist

这个版本是一个生成器。如果你想要一个列表,可以调整一下。

def list_or_tuple(l):
return isinstance(l,(list,tuple))
## predicate will select the container  to be flattened
## write your own as required
## this one flattens every list/tuple




def flatten(seq,predicate=list_or_tuple):
## recursive generator
for i in seq:
if predicate(seq):
for j in flatten(i):
yield j
else:
yield i

如果想将满足条件的谓词平展,可以添加谓词

摘自python食谱

根据我的经验,最有效的将列表的列表扁平化的方法是:

flat_list = []
map(flat_list.extend, list_of_list)

有时与其他提出的方法进行比较:

list_of_list = [range(10)]*1000
%timeit flat_list=[]; map(flat_list.extend, list_of_list)
#10000 loops, best of 3: 119 µs per loop
%timeit flat_list=list(itertools.chain.from_iterable(list_of_list))
#1000 loops, best of 3: 210 µs per loop
%timeit flat_list=[i for sublist in list_of_list for i in sublist]
#1000 loops, best of 3: 525 µs per loop
%timeit flat_list=reduce(list.__add__,list_of_list)
#100 loops, best of 3: 18.1 ms per loop

现在,当处理更长的子列表时,效率增益会更好:

list_of_list = [range(1000)]*10
%timeit flat_list=[]; map(flat_list.extend, list_of_list)
#10000 loops, best of 3: 60.7 µs per loop
%timeit flat_list=list(itertools.chain.from_iterable(list_of_list))
#10000 loops, best of 3: 176 µs per loop

这个方法也适用于任何迭代对象:

class SquaredRange(object):
def __init__(self, n):
self.range = range(n)
def __iter__(self):
for i in self.range:
yield i**2


list_of_list = [SquaredRange(5)]*3
flat_list = []
map(flat_list.extend, list_of_list)
print flat_list
#[0, 1, 4, 9, 16, 0, 1, 4, 9, 16, 0, 1, 4, 9, 16]

pylab提供扁平化: 链接到numpy flatten < / p >

sum(list_of_lists, [])会使它变平。

l = [['image00', 'image01'], ['image10'], []]
print sum(l,[]) # prints ['image00', 'image01', 'image10']

似乎与operator.add混淆!当你将两个列表相加时,正确的术语是concat,而不是add。你需要使用的是operator.concat

如果你考虑的是功能性的,它就像这样简单:

>>> from functools import reduce
>>> import operator
>>> list2d = ((1,2,3),(4,5,6), (7,), (8,9))
>>> reduce(operator.concat, list2d)
(1, 2, 3, 4, 5, 6, 7, 8, 9)

你看,reduce尊重序列类型,所以当你提供一个元组时,你得到一个元组。让我们尝试一个列表::

>>> list2d = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(operator.concat, list2d)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

啊哈,你会得到一个列表。

性能如何?

>>> list2d = [[1,2,3],[4,5,6], [7], [8,9]]
>>> %timeit list(itertools.chain.from_iterable(list2d))
1000000 loops, best of 3: 1.36 µs per loop

From_iterable非常快!但是reduce和concat并不是一个可比性。

>>> list2d = ((1,2,3),(4,5,6), (7,), (8,9))
>>> %timeit reduce(operator.concat, list2d)
1000000 loops, best of 3: 492 ns per loop

如果列表中的每一项都是字符串(并且这些字符串中的任何字符串都使用" "而不是' '),则可以使用正则表达式(re模块)

>>> flattener = re.compile("\'.*?\'")
>>> flattener
<_sre.SRE_Pattern object at 0x10d439ca8>
>>> stred = str(in_list)
>>> outed = flattener.findall(stred)

上面的代码将in_list转换为字符串,使用regex找到引号内的所有子字符串(即列表中的每一项),并将它们作为列表输出。

下面是一个使用collectons.Iterable用于多个级别列表的版本:

import collections


def flatten(o, flatten_condition=lambda i: isinstance(i,
collections.Iterable) and not isinstance(i, str)):
result = []
for i in o:
if flatten_condition(i):
result.extend(flatten(i, flatten_condition))
else:
result.append(i)
return result

如果你要扁平化一个更复杂的列表与不可迭代元素或深度大于2,你可以使用以下函数:

def flat_list(list_to_flat):
if not isinstance(list_to_flat, list):
yield list_to_flat
else:
for item in list_to_flat:
yield from flat_list(item)

它将返回一个生成器对象,您可以将其转换为带有list()函数的列表。注意,yield from语法从python3.3开始可用,但你可以使用显式迭代代替 例子:< / p >

>>> a = [1, [2, 3], [1, [2, 3, [1, [2, 3]]]]]
>>> print(list(flat_list(a)))
[1, 2, 3, 1, 2, 3, 1, 2, 3]

如果你正在寻找一个内置的,简单的,一行程序,你可以使用:

a = [[1, 2, 3], [4, 5, 6]
b = [i[x] for i in a for x in range(len(i))]
print b

返回

[1, 2, 3, 4, 5, 6]

一个简单的替代方法是使用numpy的连接,但它将内容转换为浮点数:

import numpy as np
print np.concatenate([[1,2],[3],[5,89],[],[6]])
# array([  1.,   2.,   3.,   5.,  89.,   6.])
print list(np.concatenate([[1,2],[3],[5,89],[],[6]]))
# [  1.,   2.,   3.,   5.,  89.,   6.]

在Python 2或3中实现这一点的最简单方法是使用pip install morph来使用变形库。

代码是:

import morph


list = [[1,2],[3],[5,89],[],[6]]
flattened_list = morph.flatten(list)  # returns [1, 2, 3, 5, 89, 6]
def is_iterable(item):
return isinstance(item, list) or isinstance(item, tuple)




def flatten(items):
for i in items:
if is_iterable(item):
for m in flatten(i):
yield m
else:
yield i

测试:

print list(flatten2([1.0, 2, 'a', (4,), ((6,), (8,)), (((8,),(9,)), ((12,),(10)))]))