python filter list of dictionaries based on key value

I have a list of dictionaries and each dictionary has a key of (let's say) 'type' which can have values of 'type1', 'type2', etc. My goal is to filter out these dictionaries into a list of the same dictionaries but only the ones of a certain "type". I think i'm just really struggling with list/dictionary comprehensions.

so an example list would look like:

exampleSet = [{'type':'type1'},{'type':'type2'},{'type':'type2'}, {'type':'type3'}]

i have a list of key values. lets say for example:

keyValList = ['type2','type3']

where the expected resulting list would look like:

expectedResult = [{'type':'type2'},{'type':'type2'},{'type':'type3'}]

I know i could do this with a set of for loops. I know there has to be a simpler way though. i found a lot of different flavors of this question but none that really fit the bill and answered the question. I would post an attempt at the answer... but they weren't that impressive. probably best to leave it open ended. any assistance would be greatly appreciated.

188859 次浏览

你可以试试列表

>>> exampleSet = [{'type':'type1'},{'type':'type2'},{'type':'type2'}, {'type':'type3'}]
>>> keyValList = ['type2','type3']
>>> expectedResult = [d for d in exampleSet if d['type'] in keyValList]
>>> expectedResult
[{'type': 'type2'}, {'type': 'type2'}, {'type': 'type3'}]

另一种方法是使用 filter

>>> list(filter(lambda d: d['type'] in keyValList, exampleSet))
[{'type': 'type2'}, {'type': 'type2'}, {'type': 'type3'}]

使用 filter,或者如果 exampleSet中的字典数量太多,则使用 itertools模块的 ifilter。它会返回一个迭代器,而不是立刻用整个列表填满系统的内存:

from itertools import ifilter
for elem in ifilter(lambda x: x['type'] in keyValList, exampleSet):
print elem

我从这篇文章中试了几个答案,测试了每个答案的性能。

根据我的初步猜测,到目前为止,列表内涵更快filterlist方法位居第二,pandas位居第三。

已定义的变量:

import pandas as pd


exampleSet = [{'type': 'type' + str(number)} for number in range(0, 1_000_000)]


keyValList = ['type21', 'type950000']


1-list comprehension

%%timeit
expectedResult = [d for d in exampleSet if d['type'] in keyValList]

每个循环60.7 ms ± 188μs (平均值 ± 标准开发周期7次,每次10个循环)

2-filterlist

%%timeit
expectedResult = list(filter(lambda d: d['type'] in keyValList, exampleSet))

每个循环94ms ± 328μs (平均值 ± 标准开发周期7次,每次10个循环)

3-pandas

%%timeit
df = pd.DataFrame(exampleSet)
expectedResult = df[df['type'].isin(keyValList)].to_dict('records')

每个循环336ms ± 1.84 ms (平均值 ± 标准开发值7次,每次循环1次)


值得注意的是,使用 pandas来处理 dict并不是一个好主意,因为 pandas.DataFrame基本上是一个更消耗内存的 dict,如果你最终不打算使用数据帧,它只是效率低下。

filter the list of dictionaries based on key-value pairs的通用方法

def get_dic_filter_func(**kwargs):
"""Func to be used for map/filter function,
returned func will take dict values from kwargs keys and compare resulted dict with kwargs"""
def func(dic):
dic_to_compare = {k: v for k, v in dic.items() if k in kwargs}
return dic_to_compare == kwargs
return func




def filter_list_of_dicts(list_of_dicts, **kwargs):
"""Filter list of dicts with key/value pairs
in result will be added only dicts which has same key/value pairs as in kwargs """
filter_func = get_dic_filter_func(**kwargs)
return list(filter(filter_func, list_of_dicts))

测试用例 /如何使用

    def test_filter_list_of_dicts(self):
dic1 = {'a': '1', 'b': 2}
dic2 = {'a': 1, 'b': 3}
dic3 = {'a': 2, 'b': 3}
the_list = [dic1, dic2, dic3]


self.assertEqual([], filter_list_of_dicts(the_list, x=1))
self.assertEqual([dic1], filter_list_of_dicts(the_list, a='1'))
self.assertEqual([dic2], filter_list_of_dicts(the_list, a=1))
self.assertEqual([dic2, dic3], filter_list_of_dicts(the_list, b=3))