计算两个 Python 字典中包含的键的差异

假设我有两个 Python 字典-dictAdictB。我需要找出是否有任何键存在于 dictB但不在 dictA。最快的方法是什么?

我是否应该将字典键转换成一个集合,然后进行转换?

想知道你的想法。


谢谢你的回复。

很抱歉没有正确地提出我的问题。 我的场景是这样的-我有一个 dictA,它可以是相同的 dictB或可能有一些键丢失相比,dictB或其他一些键的值可能是不同的,必须设置为 dictA键的值。

问题是字典没有标准,可以有价值,可以字典。

dictA={'key1':a, 'key2':b, 'key3':{'key11':cc, 'key12':dd}, 'key4':{'key111':{....}}}
dictB={'key1':a, 'key2:':newb, 'key3':{'key11':cc, 'key12':newdd, 'key13':ee}.......

因此,‘ key2’值必须重置为新值,并且‘ key13’必须添加到 dict 中。 键值没有固定的格式。它可以是一个简单的值,也可以是一个笔记或者是一个笔记。

162279 次浏览

不确定它是否“快”,但通常情况下,人们可以这样做

dicta = {"a":1,"b":2,"c":3,"d":4}
dictb = {"a":1,"d":2}
for key in dicta.keys():
if not key in dictb:
print key

正如 Alex Martelli 所写,如果你只是想检查 B 中的任何键是否在 A 中,any(True for k in dictB if k not in dictA)将是一种方法。

要找到丢失的钥匙:

diff = set(dictB)-set(dictA) #sets


C:\Dokumente und Einstellungen\thc>python -m timeit -s "dictA =
dict(zip(range(1000),range
(1000))); dictB = dict(zip(range(0,2000,2),range(1000)))" "diff=set(dictB)-set(dictA)"
10000 loops, best of 3: 107 usec per loop


diff = [ k for k in dictB if k not in dictA ] #lc


C:\Dokumente und Einstellungen\thc>python -m timeit -s "dictA =
dict(zip(range(1000),range
(1000))); dictB = dict(zip(range(0,2000,2),range(1000)))" "diff=[ k for k in dictB if
k not in dictA ]"
10000 loops, best of 3: 95.9 usec per loop

所以这两个解的速度几乎是一样的。

您可以对键使用 set 操作:

diff = set(dictb.keys()) - set(dicta.keys())

这里有一个类可以找到所有的可能性: 添加了什么,删除了什么,哪些键-值对是相同的,以及哪些键-值对被更改。

class DictDiffer(object):
"""
Calculate the difference between two dictionaries as:
(1) items added
(2) items removed
(3) keys same in both but changed values
(4) keys same in both and unchanged values
"""
def __init__(self, current_dict, past_dict):
self.current_dict, self.past_dict = current_dict, past_dict
self.set_current, self.set_past = set(current_dict.keys()), set(past_dict.keys())
self.intersect = self.set_current.intersection(self.set_past)
def added(self):
return self.set_current - self.intersect
def removed(self):
return self.set_past - self.intersect
def changed(self):
return set(o for o in self.intersect if self.past_dict[o] != self.current_dict[o])
def unchanged(self):
return set(o for o in self.intersect if self.past_dict[o] == self.current_dict[o])

下面是一些示例输出:

>>> a = {'a': 1, 'b': 1, 'c': 0}
>>> b = {'a': 1, 'b': 2, 'd': 0}
>>> d = DictDiffer(b, a)
>>> print "Added:", d.added()
Added: set(['d'])
>>> print "Removed:", d.removed()
Removed: set(['c'])
>>> print "Changed:", d.changed()
Changed: set(['b'])
>>> print "Unchanged:", d.unchanged()
Unchanged: set(['a'])

可作为 github 回购: Https://github.com/hughdbrown/dictdiffer

如果你真的是你所说的那样(你只需要找出 B 中是否有“任何键”,而不是 A 中,不是那些可能有的键) ,最快的方法应该是:

if any(True for k in dictB if k not in dictA): ...

如果你真的需要找出哪些关键字(如果有的话)是在 B 而不是在 A,而且不仅仅是“如果”有这样的关键字,那么现有的答案是相当合适的(但我建议在未来的问题中更精确,如果这确实是你的意思;。

下面是一种可行的方法,它允许计算结果为 False的键,如果可能的话,还可以使用生成器表达式尽早退出。不过也不是特别漂亮。

any(map(lambda x: True, (k for k in b if k not in a)))

编辑:

THC4k 回复了我对另一个答案的评论。以下是一个更好、更漂亮的方法:

any(True for k in b if k not in a)

不知道为什么我从没想过。

If on Python ≥2.7:

# update different values in dictB
# I would assume only dictA should be updated,
# but the question specifies otherwise


for k in dictA.viewkeys() & dictB.viewkeys():
if dictA[k] != dictB[k]:
dictB[k]= dictA[k]


# add missing keys to dictA


dictA.update( (k,dictB[k]) for k in dictB.viewkeys() - dictA.viewkeys() )

标准(比较完整对象)怎么样

PyDev-> new PyDev Module-> Module: unittest

import unittest




class Test(unittest.TestCase):




def testName(self):
obj1 = {1:1, 2:2}
obj2 = {1:1, 2:2}
self.maxDiff = None # sometimes is usefull
self.assertDictEqual(d1, d2)


if __name__ == "__main__":
#import sys;sys.argv = ['', 'Test.testName']


unittest.main()

不确定它是否仍然是相关的,但我遇到了这个问题,我的情况,我只需要返回一个字典的变化为所有嵌套的字典等等。在那里找不到一个好的解决方案,但我确实结束了 写一个简单的函数来做这件事。希望这对你有帮助,

还有另外一个 关于这个论点的堆栈溢出问题,我必须承认有一个简单的解决方案: python 的 Datadiff 图书馆有助于打印两个字典之间的区别。

这是一个古老的问题,问的比我需要的少一点,所以这个答案实际上解决了比这个问题要求的更多的问题。这个问题的答案帮助我解决了以下问题:

  1. (问)记录两本词典之间的差异
  2. 将 # 1中的差异合并到基本字典中
  3. 合并两本字典之间的差异(把第二本字典当作差异字典来看待)
  4. 尝试检测项目的移动和变化
  5. 递归地执行所有这些操作

所有这些与 JSON 相结合,形成了一个非常强大的配置存储支持。

解决方案(也在 github 上) :

from collections import OrderedDict
from pprint import pprint




class izipDestinationMatching(object):
__slots__ = ("attr", "value", "index")


def __init__(self, attr, value, index):
self.attr, self.value, self.index = attr, value, index


def __repr__(self):
return "izip_destination_matching: found match by '%s' = '%s' @ %d" % (self.attr, self.value, self.index)




def izip_destination(a, b, attrs, addMarker=True):
"""
Returns zipped lists, but final size is equal to b with (if shorter) a padded with nulls
Additionally also tries to find item reallocations by searching child dicts (if they are dicts) for attribute, listed in attrs)
When addMarker == False (patching), final size will be the longer of a, b
"""
for idx, item in enumerate(b):
try:
attr = next((x for x in attrs if x in item), None)  # See if the item has any of the ID attributes
match, matchIdx = next(((orgItm, idx) for idx, orgItm in enumerate(a) if attr in orgItm and orgItm[attr] == item[attr]), (None, None)) if attr else (None, None)
if match and matchIdx != idx and addMarker: item[izipDestinationMatching] = izipDestinationMatching(attr, item[attr], matchIdx)
except:
match = None
yield (match if match else a[idx] if len(a) > idx else None), item
if not addMarker and len(a) > len(b):
for item in a[len(b) - len(a):]:
yield item, item




def dictdiff(a, b, searchAttrs=[]):
"""
returns a dictionary which represents difference from a to b
the return dict is as short as possible:
equal items are removed
added / changed items are listed
removed items are listed with value=None
Also processes list values where the resulting list size will match that of b.
It can also search said list items (that are dicts) for identity values to detect changed positions.
In case such identity value is found, it is kept so that it can be re-found during the merge phase
@param a: original dict
@param b: new dict
@param searchAttrs: list of strings (keys to search for in sub-dicts)
@return: dict / list / whatever input is
"""
if not (isinstance(a, dict) and isinstance(b, dict)):
if isinstance(a, list) and isinstance(b, list):
return [dictdiff(v1, v2, searchAttrs) for v1, v2 in izip_destination(a, b, searchAttrs)]
return b
res = OrderedDict()
if izipDestinationMatching in b:
keepKey = b[izipDestinationMatching].attr
del b[izipDestinationMatching]
else:
keepKey = izipDestinationMatching
for key in sorted(set(a.keys() + b.keys())):
v1 = a.get(key, None)
v2 = b.get(key, None)
if keepKey == key or v1 != v2: res[key] = dictdiff(v1, v2, searchAttrs)
if len(res) <= 1: res = dict(res)  # This is only here for pretty print (OrderedDict doesn't pprint nicely)
return res




def dictmerge(a, b, searchAttrs=[]):
"""
Returns a dictionary which merges differences recorded in b to base dictionary a
Also processes list values where the resulting list size will match that of a
It can also search said list items (that are dicts) for identity values to detect changed positions
@param a: original dict
@param b: diff dict to patch into a
@param searchAttrs: list of strings (keys to search for in sub-dicts)
@return: dict / list / whatever input is
"""
if not (isinstance(a, dict) and isinstance(b, dict)):
if isinstance(a, list) and isinstance(b, list):
return [dictmerge(v1, v2, searchAttrs) for v1, v2 in izip_destination(a, b, searchAttrs, False)]
return b
res = OrderedDict()
for key in sorted(set(a.keys() + b.keys())):
v1 = a.get(key, None)
v2 = b.get(key, None)
#print "processing", key, v1, v2, key not in b, dictmerge(v1, v2)
if v2 is not None: res[key] = dictmerge(v1, v2, searchAttrs)
elif key not in b: res[key] = v1
if len(res) <= 1: res = dict(res)  # This is only here for pretty print (OrderedDict doesn't pprint nicely)
return res

如果您想要递归地获得差异,我已经为 python 编写了一个包: Https://github.com/seperman/deepdiff

安装

从 PyPi 安装:

pip install deepdiff

示例用法

进口

>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function # In case running on Python 2

同一个对象返回空

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> print(DeepDiff(t1, t2))
{}

项的类型已更改

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{ 'type_changes': { 'root[2]': { 'newtype': <class 'str'>,
'newvalue': '2',
'oldtype': <class 'int'>,
'oldvalue': 2}}}

项的值已更改

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

项目添加和/或删除

>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff)
{'dic_item_added': ['root[5]', 'root[6]'],
'dic_item_removed': ['root[4]'],
'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

字符串差异

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { 'root[2]': {'newvalue': 4, 'oldvalue': 2},
"root[4]['b']": { 'newvalue': 'world!',
'oldvalue': 'world'}}}

字符串差异2

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!\nGoodbye!\n1\n2\nEnd"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n1\n2\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { "root[4]['b']": { 'diff': '--- \n'
'+++ \n'
'@@ -1,5 +1,4 @@\n'
'-world!\n'
'-Goodbye!\n'
'+world\n'
' 1\n'
' 2\n'
' End',
'newvalue': 'world\n1\n2\nEnd',
'oldvalue': 'world!\n'
'Goodbye!\n'
'1\n'
'2\n'
'End'}}}


>>>
>>> print (ddiff['values_changed']["root[4]['b']"]["diff"])
---
+++
@@ -1,5 +1,4 @@
-world!
-Goodbye!
+world
1
2
End

改变类型

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n\n\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'type_changes': { "root[4]['b']": { 'newtype': <class 'str'>,
'newvalue': 'world\n\n\nEnd',
'oldtype': <class 'list'>,
'oldvalue': [1, 2, 3]}}}

名单不同

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3, 4]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{'iterable_item_removed': {"root[4]['b'][2]": 3, "root[4]['b'][3]": 4}}

列表差异2:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'iterable_item_added': {"root[4]['b'][3]": 3},
'values_changed': { "root[4]['b'][1]": {'newvalue': 3, 'oldvalue': 2},
"root[4]['b'][2]": {'newvalue': 2, 'oldvalue': 3}}}

忽略顺序或重复的列出差异: (使用与上面相同的字典)

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2, ignore_order=True)
>>> print (ddiff)
{}

包含字典的列表:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'dic_item_removed': ["root[4]['b'][2][2]"],
'values_changed': {"root[4]['b'][2][1]": {'newvalue': 3, 'oldvalue': 1}}}

集合:

>>> t1 = {1, 2, 8}
>>> t2 = {1, 2, 3, 5}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (DeepDiff(t1, t2))
{'set_item_added': ['root[3]', 'root[5]'], 'set_item_removed': ['root[8]']}

命名 Tuple:

>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> t1 = Point(x=11, y=22)
>>> t2 = Point(x=11, y=23)
>>> pprint (DeepDiff(t1, t2))
{'values_changed': {'root.y': {'newvalue': 23, 'oldvalue': 22}}}

定制物品:

>>> class ClassA(object):
...     a = 1
...     def __init__(self, b):
...         self.b = b
...
>>> t1 = ClassA(1)
>>> t2 = ClassA(2)
>>>
>>> pprint(DeepDiff(t1, t2))
{'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

添加了对象属性:

>>> t2.c = "new attribute"
>>> pprint(DeepDiff(t1, t2))
{'attribute_added': ['root.c'],
'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

下面是一个深入比较两个字典关键字的解决方案:

def compareDictKeys(dict1, dict2):
if type(dict1) != dict or type(dict2) != dict:
return False


keys1, keys2 = dict1.keys(), dict2.keys()
diff = set(keys1) - set(keys2) or set(keys2) - set(keys1)


if not diff:
for key in keys1:
if (type(dict1[key]) == dict or type(dict2[key]) == dict) and not compareDictKeys(dict1[key], dict2[key]):
diff = True
break


return not diff

如果您想要一个内置的解决方案来完全比较任意 dict 结构,@Maxx 的答案是一个很好的开始。

import unittest


test = unittest.TestCase()
test.assertEqual(dictA, dictB)

这里有一个可以比较两个以上发音的解决方案:

def diff_dict(dicts, default=None):
diff_dict = {}
# add 'list()' around 'd.keys()' for python 3 compatibility
for k in set(sum([d.keys() for d in dicts], [])):
# we can just use "values = [d.get(k, default) ..." below if
# we don't care that d1[k]=default and d2[k]=missing will
# be treated as equal
if any(k not in d for d in dicts):
diff_dict[k] = [d.get(k, default) for d in dicts]
else:
values = [d[k] for d in dicts]
if any(v != values[0] for v in values):
diff_dict[k] = values
return diff_dict

用法例子:

import matplotlib.pyplot as plt
diff_dict([plt.rcParams, plt.rcParamsDefault, plt.matplotlib.rcParamsOrig])

根据 Ghostdog74的回答,

dicta = {"a":1,"d":2}
dictb = {"a":5,"d":2}


for value in dicta.values():
if not value in dictb.values():
print value

将打印不同价值的判决书

我的对称差是:

def find_dict_diffs(dict1, dict2):
unequal_keys = []
unequal_keys.extend(set(dict1.keys()).symmetric_difference(set(dict2.keys())))
for k in dict1.keys():
if dict1.get(k, 'N\A') != dict2.get(k, 'N\A'):
unequal_keys.append(k)
if unequal_keys:
print 'param', 'dict1\t', 'dict2'
for k in set(unequal_keys):
print str(k)+'\t'+dict1.get(k, 'N\A')+'\t '+dict2.get(k, 'N\A')
else:
print 'Dicts are equal'


dict1 = {1:'a', 2:'b', 3:'c', 4:'d', 5:'e'}
dict2 = {1:'b', 2:'a', 3:'c', 4:'d', 6:'f'}


find_dict_diffs(dict1, dict2)

结果是:

param   dict1   dict2
1       a       b
2       b       a
5       e       N\A
6       N\A     f

正如在其他答案中提到的,unittest 产生一些很好的输出来比较字典,但是在这个例子中,我们不想首先构建一个完整的测试。

撇开单元测试源代码不谈,看起来你可以通过以下方法得到一个公平的解决方案:

import difflib
import pprint


def diff_dicts(a, b):
if a == b:
return ''
return '\n'.join(
difflib.ndiff(pprint.pformat(a, width=30).splitlines(),
pprint.pformat(b, width=30).splitlines())
)

所以

dictA = dict(zip(range(7), map(ord, 'python')))
dictB = {0: 112, 1: 'spam', 2: [1,2,3], 3: 104, 4: 111}
print diff_dicts(dictA, dictB)

结果:

{0: 112,
-  1: 121,
-  2: 116,
+  1: 'spam',
+  2: [1, 2, 3],
3: 104,
-  4: 111,
?        ^


+  4: 111}
?        ^


-  5: 110}

地点:

  • “-”表示第一个但不是第二个 dict 中的键/值
  • “ +”表示第二个字符串中的键/值,但不表示第一个字符串中的键/值

与 unittest 一样,唯一的警告是,由于后面的逗号/括号,最终的映射可能被认为是差异映射。

尝试找到的交集,在两个字典的关键,如果你想找到的关键没有在第二个字典,只要使用 不在里面..。

intersect = filter(lambda x, dictB=dictB.keys(): x in dictB, dictA.keys())

使用 set() :

set(dictA.keys()).intersection(dictB.keys())

@ Maxx 有一个很好的答案,使用 Python 提供的 unittest工具:

import unittest




class Test(unittest.TestCase):
def runTest(self):
pass


def testDict(self, d1, d2, maxDiff=None):
self.maxDiff = maxDiff
self.assertDictEqual(d1, d2)

然后,在代码中的任何地方都可以调用:

try:
Test().testDict(dict1, dict2)
except Exception, e:
print e

结果输出类似于 diff的输出,用 +-在不同的每一行前面漂亮地打印字典。

Hughes Brown 的最佳答案建议使用集合差异,这无疑是最好的方法:

diff = set(dictb.keys()) - set(dicta.keys())

这段代码的问题在于,它构建两个列表只是为了创建两个集,所以它浪费了4N 时间和2N 空间。这也比实际情况要复杂一些。

通常情况下,这没什么大不了的,但如果有的话:

diff = dictb.keys() - dicta

巨蟒2

在 Python2中,keys()返回键的列表,而不是 KeysView。所以你必须直接要求 viewkeys()

diff = dictb.viewkeys() - dicta

对于双版本2.7/3.x 代码,希望您使用的是 six或类似的代码,因此您可以使用 six.viewkeys(dictb):

diff = six.viewkeys(dictb) - dicta

在2.4-2.6中,没有 KeysView。但是你至少可以通过直接从迭代器中构建左集,而不是先构建一个列表,从而将成本从4N 降低到 N:

diff = set(dictb) - dicta

物品

我有一个 dictA,它可以是相同的 dicbB 或可能有一些关键缺少相比,dicbB 或其他一些关键的值可能是不同的

所以你真的不需要比较钥匙,但物品。如果值像字符串一样是散列的,那么 ItemsView只是 Set。如果是这样,那就简单了:

diff = dictb.items() - dicta.items()

递归差异

虽然这个问题没有直接要求使用递归 diff,但是一些示例值是 dicts,而且预期的输出似乎确实使用了递归 diff。这里已经有多个答案展示了如何做到这一点。