Python词典搜索列表

假设我有这个:

[
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]

通过搜索“Pam”作为名称,我想检索相关的字典:{name: "Pam", age: 7}

如何做到这一点?

1010438 次浏览
people = [
{'name': "Tom", 'age': 10},
{'name': "Mark", 'age': 5},
{'name': "Pam", 'age': 7}
]


def search(name):
for p in people:
if p['name'] == name:
return p


search("Pam")

您可以使用生成器表达式

>>> dicts = [
...     { "name": "Tom", "age": 10 },
...     { "name": "Mark", "age": 5 },
...     { "name": "Pam", "age": 7 },
...     { "name": "Dick", "age": 12 }
... ]


>>> next(item for item in dicts if item["name"] == "Pam")
{'age': 7, 'name': 'Pam'}

如果您需要处理不存在的项目,那么您可以执行用户马特建议在他的评论并使用略有不同的API提供默认值:

next((item for item in dicts if item["name"] == "Pam"), None)

要查找项目的索引,而不是项目本身,您可以enumerate()列表:

next((i for i, item in enumerate(dicts) if item["name"] == "Pam"), None)

您可以使用列表理解

def search(name, people):
return [element for element in people if element['name'] == name]

我的第一个想法是,你可能想考虑创建一个这些字典的字典…例如,如果你要搜索它不止几次。

然而,这可能是一个过早的优化。有什么问题:

def get_records(key, store=dict()):
'''Return a list of all records containing name==key from our store
'''
assert key is not None
return [d for d in store if d['name']==key]
names = [{'name':'Tom', 'age': 10}, {'name': 'Mark', 'age': 5}, {'name': 'Pam', 'age': 7}]
resultlist = [d    for d in names     if d.get('name', '') == 'Pam']
first_result = resultlist[0]

这是一种方法…

你必须检查列表的所有元素。没有捷径!

除非您在其他地方保存了指向列表项的名称字典,否则您必须注意从列表中弹出元素的后果。

dicts=[
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]


from collections import defaultdict
dicts_by_name=defaultdict(list)
for d in dicts:
dicts_by_name[d['name']]=d


print dicts_by_name['Tom']


#output
#>>>
#{'age': 10, 'name': 'Tom'}

这是在字典列表中搜索值的一般方法:

def search_dictionaries(key, value, list_of_dictionaries):
return [element for element in list_of_dictionaries if element[key] == value]

在我看来,这是最Pythonic的方式:

people = [
{'name': "Tom", 'age': 10},
{'name': "Mark", 'age': 5},
{'name': "Pam", 'age': 7}
]


filter(lambda person: person['name'] == 'Pam', people)

结果(在Python 2中作为列表返回):

[{'age': 7, 'name': 'Pam'}]

注意:在Python 3中,返回一个过滤器对象。因此python3的解决方案将是:

list(filter(lambda person: person['name'] == 'Pam', people))

@Frédéric Hamidi的回答很棒。在Python 3. x中,.next()的语法略有变化。因此略有修改:

>>> dicts = [
{ "name": "Tom", "age": 10 },
{ "name": "Mark", "age": 5 },
{ "name": "Pam", "age": 7 },
{ "name": "Dick", "age": 12 }
]
>>> next(item for item in dicts if item["name"] == "Pam")
{'age': 7, 'name': 'Pam'}

正如@Matt在评论中提到的,您可以添加一个默认值:

>>> next((item for item in dicts if item["name"] == "Pam"), False)
{'name': 'Pam', 'age': 7}
>>> next((item for item in dicts if item["name"] == "Sam"), False)
False
>>>

在@FrédéricHamidi中添加一点点。

如果你不确定一个键是否在dicts列表中,这样的东西会有所帮助:

next((item for item in dicts if item.get("name") and item["name"] == "Pam"), None)

这是一个比较,使用迭代通过hg列表,使用filter+lambda或重构(如果需要或对您的情况有效)您的代码来判断字典而不是字典列表

import time


# Build list of dicts
list_of_dicts = list()
for i in range(100000):
list_of_dicts.append({'id': i, 'name': 'Tom'})


# Build dict of dicts
dict_of_dicts = dict()
for i in range(100000):
dict_of_dicts[i] = {'name': 'Tom'}




# Find the one with ID of 99


# 1. iterate through the list
lod_ts = time.time()
for elem in list_of_dicts:
if elem['id'] == 99999:
break
lod_tf = time.time()
lod_td = lod_tf - lod_ts


# 2. Use filter
f_ts = time.time()
x = filter(lambda k: k['id'] == 99999, list_of_dicts)
f_tf = time.time()
f_td = f_tf- f_ts


# 3. find it in dict of dicts
dod_ts = time.time()
x = dict_of_dicts[99999]
dod_tf = time.time()
dod_td = dod_tf - dod_ts




print 'List of Dictionries took: %s' % lod_td
print 'Using filter took: %s' % f_td
print 'Dict of Dicts took: %s' % dod_td

输出是这样的:

List of Dictionries took: 0.0099310874939
Using filter took: 0.0121960639954
Dict of Dicts took: 4.05311584473e-06

结论: 显然,在这些情况下,拥有一本字典是能够搜索的最有效的方法,你知道你只会通过id进行搜索。 使用过滤器是最慢的解决方案。

你试过熊猫包吗?它非常适合这种搜索任务,也进行了优化。

import pandas as pd


listOfDicts = [
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]


# Create a data frame, keys are used as column headers.
# Dict items with the same key are entered into the same respective column.
df = pd.DataFrame(listOfDicts)


# The pandas dataframe allows you to pick out specific values like so:


df2 = df[ (df['name'] == 'Pam') & (df['age'] == 7) ]


# Alternate syntax, same thing


df2 = df[ (df.name == 'Pam') & (df.age == 7) ]

我在下面添加了一些基准测试,以说明熊猫在更大范围内更快的运行时,即100k+条目:

setup_large = 'dicts = [];\
[dicts.extend(({ "name": "Tom", "age": 10 },{ "name": "Mark", "age": 5 },\
{ "name": "Pam", "age": 7 },{ "name": "Dick", "age": 12 })) for _ in range(25000)];\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(dicts);'


setup_small = 'dicts = [];\
dicts.extend(({ "name": "Tom", "age": 10 },{ "name": "Mark", "age": 5 },\
{ "name": "Pam", "age": 7 },{ "name": "Dick", "age": 12 }));\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(dicts);'


method1 = '[item for item in dicts if item["name"] == "Pam"]'
method2 = 'df[df["name"] == "Pam"]'


import timeit
t = timeit.Timer(method1, setup_small)
print('Small Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_small)
print('Small Method Pandas: ' + str(t.timeit(100)))


t = timeit.Timer(method1, setup_large)
print('Large Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_large)
print('Large Method Pandas: ' + str(t.timeit(100)))


#Small Method LC: 0.000191926956177
#Small Method Pandas: 0.044392824173
#Large Method LC: 1.98827004433
#Large Method Pandas: 0.324505090714

当我在寻找同样的答案时,我发现了这个线程 问题。虽然我意识到这是一个迟到的答案,我想我 如果对其他人有用,请贡献它:

def find_dict_in_list(dicts, default=None, **kwargs):
"""Find first matching :obj:`dict` in :obj:`list`.


:param list dicts: List of dictionaries.
:param dict default: Optional. Default dictionary to return.
Defaults to `None`.
:param **kwargs: `key=value` pairs to match in :obj:`dict`.


:returns: First matching :obj:`dict` from `dicts`.
:rtype: dict


"""


rval = default
for d in dicts:
is_found = False


# Search for keys in dict.
for k, v in kwargs.items():
if d.get(k, None) == v:
is_found = True


else:
is_found = False
break


if is_found:
rval = d
break


return rval




if __name__ == '__main__':
# Tests
dicts = []
keys = 'spam eggs shrubbery knight'.split()


start = 0
for _ in range(4):
dct = {k: v for k, v in zip(keys, range(start, start+4))}
dicts.append(dct)
start += 4


# Find each dict based on 'spam' key only.
for x in range(len(dicts)):
spam = x*4
assert find_dict_in_list(dicts, spam=spam) == dicts[x]


# Find each dict based on 'spam' and 'shrubbery' keys.
for x in range(len(dicts)):
spam = x*4
assert find_dict_in_list(dicts, spam=spam, shrubbery=spam+2) == dicts[x]


# Search for one correct key, one incorrect key:
for x in range(len(dicts)):
spam = x*4
assert find_dict_in_list(dicts, spam=spam, shrubbery=spam+1) is None


# Search for non-existent dict.
for x in range(len(dicts)):
spam = x+100
assert find_dict_in_list(dicts, spam=spam) is None

我测试了各种方法来遍历字典列表并返回键x具有一定值的字典。

结果:

  • 速度:列表理解>生成器表达式>>普通列表迭代>>>过滤器。
  • 所有缩放与列表中的字典数量呈线性关系(10倍列表大小->10倍时间)。
  • 对于大量(数千)键,每个字典的键不会显着影响速度。请参阅我计算的此图:https://imgur.com/a/quQzv(方法名称见下文)。

所有测试均使用python3.6.4、W7x64完成。

from random import randint
from timeit import timeit




list_dicts = []
for _ in range(1000):     # number of dicts in the list
dict_tmp = {}
for i in range(10):   # number of keys for each dict
dict_tmp[f"key{i}"] = randint(0,50)
list_dicts.append( dict_tmp )






def a():
# normal iteration over all elements
for dict_ in list_dicts:
if dict_["key3"] == 20:
pass


def b():
# use 'generator'
for dict_ in (x for x in list_dicts if x["key3"] == 20):
pass


def c():
# use 'list'
for dict_ in [x for x in list_dicts if x["key3"] == 20]:
pass


def d():
# use 'filter'
for dict_ in filter(lambda x: x['key3'] == 20, list_dicts):
pass

结果:

1.7303 # normal list iteration
1.3849 # generator expression
1.3158 # list comprehension
7.7848 # filter

简单地使用列表理解:

[i for i in dct if i['name'] == 'Pam'][0]

示例代码:

dct = [
{'name': 'Tom', 'age': 10},
{'name': 'Mark', 'age': 5},
{'name': 'Pam', 'age': 7}
]


print([i for i in dct if i['name'] == 'Pam'][0])


> {'age': 7, 'name': 'Pam'}

你可以试试这个:

''' lst: list of dictionaries '''
lst = [{"name": "Tom", "age": 10}, {"name": "Mark", "age": 5}, {"name": "Pam", "age": 7}]


search = raw_input("What name: ") #Input name that needs to be searched (say 'Pam')


print [ lst[i] for i in range(len(lst)) if(lst[i]["name"]==search) ][0] #Output
>>> {'age': 7, 'name': 'Pam'}

您可以通过使用Python中的filter和next方法来实现这一点。

filter方法过滤给定的序列并返回一个迭代器。 next方法接受迭代器并返回列表中的下一个元素。

所以你可以通过,

my_dict = [
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]


next(filter(lambda obj: obj.get('name') == 'Pam', my_dict), None)

输出是,

{'name': 'Pam', 'age': 7}

注意:如果未找到我们正在搜索的名称,上述代码将返回None incase。

使用列表推导的一种简单方法是,如果l是列表

l = [
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]

然后

[d['age'] for d in l if d['name']=='Tom']

这里提出的大多数(如果不是全部)实现都有两个缺陷:

  • 他们假设只有一个键被传递用于搜索,而对于复杂的判决有更多的键可能会很有趣
  • 他们假设所有为搜索传递的键都存在于dicts中,因此他们不能正确处理KeyError。

更新的建议:

def find_first_in_list(objects, **kwargs):
return next((obj for obj in objects if
len(set(obj.keys()).intersection(kwargs.keys())) > 0 and
all([obj[k] == v for k, v in kwargs.items() if k in obj.keys()])),
None)

也许不是最Pythonic,但至少有点故障保险。

用法:

>>> obj1 = find_first_in_list(list_of_dict, name='Pam', age=7)
>>> obj2 = find_first_in_list(list_of_dict, name='Pam', age=27)
>>> obj3 = find_first_in_list(list_of_dict, name='Pam', address='nowhere')
>>>
>>> print(obj1, obj2, obj3)
{"name": "Pam", "age": 7}, None, {"name": "Pam", "age": 7}

要点

def dsearch(lod, **kw):
return filter(lambda i: all((i[k] == v for (k, v) in kw.items())), lod)


lod=[{'a':33, 'b':'test2', 'c':'a.ing333'},
{'a':22, 'b':'ihaha', 'c':'fbgval'},
{'a':33, 'b':'TEst1', 'c':'s.ing123'},
{'a':22, 'b':'ihaha', 'c':'dfdvbfjkv'}]






list(dsearch(lod, a=22))


[{'a': 22, 'b': 'ihaha', 'c': 'fbgval'},
{'a': 22, 'b': 'ihaha', 'c': 'dfdvbfjkv'}]






list(dsearch(lod, a=22, b='ihaha'))


[{'a': 22, 'b': 'ihaha', 'c': 'fbgval'},
{'a': 22, 'b': 'ihaha', 'c': 'dfdvbfjkv'}]




list(dsearch(lod, a=22, c='fbgval'))


[{'a': 22, 'b': 'ihaha', 'c': 'fbgval'}]

我会创建一个像这样的dicts字典:

names = ["Tom", "Mark", "Pam"]
ages = [10, 5, 7]
my_d = {}


for i, j in zip(names, ages):
my_d[i] = {"name": i, "age": j}

或者,使用与发布的问题完全相同的信息:

info_list = [{"name": "Tom", "age": 10}, {"name": "Mark", "age": 5}, {"name": "Pam", "age": 7}]
my_d = {}


for d in info_list:
my_d[d["name"]] = d

然后你可以做my_d["Pam"]并获得{"name": "Pam", "age": 7}

将接受的答案放入函数中,以便于重用

def get_item(collection, key, target):
return next((item for item in collection if item[key] == target), None)

也可以是lambda

   get_item_lambda = lambda collection, key, target : next((item for item in collection if item[key] == target), None)

结果

    key = "name"
target = "Pam"
print(get_item(target_list, key, target))
print(get_item_lambda(target_list, key, target))


#{'name': 'Pam', 'age': 7}
#{'name': 'Pam', 'age': 7}

如果键不在目标字典中,请使用dict.get并避免KeyError

def get_item(collection, key, target):
return next((item for item in collection if item.get(key, None) == target), None)


get_item_lambda = lambda collection, key, target : next((item for item in collection if item.get(key, None) == target), None)

鸭子将比列表理解或过滤器快得多。它在对象上构建索引,因此查找不需要扫描每个项目。

pip install ducks

from ducks import Dex


dicts = [
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]


# Build the index
dex = Dex(dicts, {'name': str, 'age': int})


# Find matching objects
dex[{'name': 'Pam', 'age': 7}]


结果:[{'name': 'Pam', 'age': 7}]