如何完全遍历未知深度的复杂字典？

小开

如果只需要遍历字典，我建议使用一个递归 walk函数，它接受一个字典，然后递归遍历它的元素。就像这样:

def walk(node):
for key, item in node.items():
if item is a collection:
walk(item)
else:
It is a leaf, do your thing

如果您还希望搜索元素，或者查询传递某些条件的几个元素，那么可以查看 Jsonpath模块。

小开

最佳答案

可以使用递归生成器将字典转换为平面列表。

def dict_generator(indict, pre=None):
pre = pre[:] if pre else []
if isinstance(indict, dict):
for key, value in indict.items():
if isinstance(value, dict):
for d in dict_generator(value, pre + [key]):
yield d
elif isinstance(value, list) or isinstance(value, tuple):
for v in value:
for d in dict_generator(v, pre + [key]):
yield d
else:
yield pre + [key, value]
else:
yield pre + [indict]

它回来了

[u'body', u'kind', u'var']
[u'init', u'declarations', u'body', u'type', u'Literal']
[u'init', u'declarations', u'body', u'value', 2]
[u'declarations', u'body', u'type', u'VariableDeclarator']
[u'id', u'declarations', u'body', u'type', u'Identifier']
[u'id', u'declarations', u'body', u'name', u'i']
[u'body', u'type', u'VariableDeclaration']
[u'body', u'kind', u'var']
[u'init', u'declarations', u'body', u'type', u'Literal']
[u'init', u'declarations', u'body', u'value', 4]
[u'declarations', u'body', u'type', u'VariableDeclarator']
[u'id', u'declarations', u'body', u'type', u'Identifier']
[u'id', u'declarations', u'body', u'name', u'j']
[u'body', u'type', u'VariableDeclaration']
[u'body', u'kind', u'var']
[u'init', u'declarations', u'body', u'operator', u'*']
[u'right', u'init', u'declarations', u'body', u'type', u'Identifier']
[u'right', u'init', u'declarations', u'body', u'name', u'j']
[u'init', u'declarations', u'body', u'type', u'BinaryExpression']
[u'left', u'init', u'declarations', u'body', u'type', u'Identifier']
[u'left', u'init', u'declarations', u'body', u'name', u'i']
[u'declarations', u'body', u'type', u'VariableDeclarator']
[u'id', u'declarations', u'body', u'type', u'Identifier']
[u'id', u'declarations', u'body', u'name', u'answer']
[u'body', u'type', u'VariableDeclaration']
[u'type', u'Program']

小开

如果您知道数据的含义，那么可能需要创建一个 parse函数来将嵌套的容器转换为自定义类型的对象树。然后使用这些自定义对象的方法来处理数据。

对于示例数据结构，您可以创建 Program、 VariableDeclaration、 VariableDeclarator、 Identifier、 Literal和 BinaryExpression类，然后对解析器使用类似的内容:

def parse(d):
t = d[u"type"]


if t == u"Program":
body = [parse(block) for block in d[u"body"]]
return Program(body)


else if t == u"VariableDeclaration":
kind = d[u"kind"]
declarations = [parse(declaration) for declaration in d[u"declarations"]]
return VariableDeclaration(kind, declarations)


else if t == u"VariableDeclarator":
id = parse(d[u"id"])
init = parse(d[u"init"])
return VariableDeclarator(id, init)


else if t == u"Identifier":
return Identifier(d[u"name"])


else if t == u"Literal":
return Literal(d[u"value"])


else if t == u"BinaryExpression":
operator = d[u"operator"]
left = parse(d[u"left"])
right = parse(d[u"right"])
return BinaryExpression(operator, left, right)


else:
raise ValueError("Invalid data structure.")

小开

您可以从标准库 json模块扩展编码器和解码器，而不是根据任务编写自己的解析器。

如果您需要将属于自定义类的对象编码到 json 中，我特别推荐这样做。如果您必须对 json 的字符串表示执行一些操作，也可以考虑迭代 JSONEncoder ()。迭代码

对于这两个参考是 http://docs.python.org/2/library/json.html#encoders-and-decoders

小开

也许能帮上忙:

def walk(d):
global path
for k,v in d.items():
if isinstance(v, str) or isinstance(v, int) or isinstance(v, float):
path.append(k)
print "{}={}".format(".".join(path), v)
path.pop()
elif v is None:
path.append(k)
## do something special
path.pop()
elif isinstance(v, dict):
path.append(k)
walk(v)
path.pop()
else:
print "###Type {} not recognized: {}.{}={}".format(type(v), ".".join(path),k, v)


mydict = {'Other': {'Stuff': {'Here': {'Key': 'Value'}}}, 'root1': {'address': {'country': 'Brazil', 'city': 'Sao', 'x': 'Pinheiros'}, 'surname': 'Fabiano', 'name': 'Silos', 'height': 1.9}, 'root2': {'address': {'country': 'Brazil', 'detail': {'neighbourhood': 'Central'}, 'city': 'Recife'}, 'surname': 'My', 'name': 'Friend', 'height': 1.78}}


path = []
walk(mydict)

将产生如下产出:

Other.Stuff.Here.Key=Value
root1.height=1.9
root1.surname=Fabiano
root1.name=Silos
root1.address.country=Brazil
root1.address.x=Pinheiros
root1.address.city=Sao
root2.height=1.78
root2.surname=My
root2.name=Friend
root2.address.country=Brazil
root2.address.detail.neighbourhood=Central
root2.address.city=Recife

小开

对上述解决方案的一些补充(处理包括列表的 json)

#!/usr/bin/env python


import json


def walk(d):
global path
for k,v in d.items():
if isinstance(v, str) or isinstance(v, int) or isinstance(v, float):
path.append(k)
print("{}={}".format(".".join(path), v))
path.pop()
elif v is None:
path.append(k)
# do something special
path.pop()
elif isinstance(v, list):
path.append(k)
for v_int in v:
walk(v_int)
path.pop()
elif isinstance(v, dict):
path.append(k)
walk(v)
path.pop()
else:
print("###Type {} not recognized: {}.{}={}".format(type(v), ".".join(path),k, v))


with open('abc.json') as f:
myjson = json.load(f)


path = []
walk(myjson)

小开

如果接受的答案适合你，但你也想要一个完整的，有序的路径，包括嵌套数组的数字索引，这个微小的变化将工作:

def dict_generator(indict, pre=None):
pre = pre[:] if pre else []
if isinstance(indict, dict):
for key, value in indict.items():
if isinstance(value, dict):
for d in dict_generator(value,  pre + [key]):
yield d
elif isinstance(value, list) or isinstance(value, tuple):
for k,v in enumerate(value):
for d in dict_generator(v, pre + [key] + [k]):
yield d
else:
yield pre + [key, value]
else:
yield indict

小开

我更灵活的版本如下:

def walktree(tree, at=lambda node: not isinstance(node, dict), prefix=(),
flattennode=lambda node:isinstance(node, (list, tuple, set))):
"""
Traverse a tree, and return a iterator of the paths from the root nodes to the leaf nodes.
tree: like '{'a':{'b':1,'c':2}}'
at: a bool function or a int indicates levels num to go down. walktree(tree, at=1) equivalent to tree.items()
flattennode: a bool function to decide whether to iterate at node value
"""
if isinstance(at, int):
isleaf_ = at == 0
isleaf = lambda v: isleaf_
at = at - 1
else:
isleaf = at
if isleaf(tree):
if not flattennode(tree):
yield (*prefix, tree)
else:
for v in tree:
yield from walktree(v, at, prefix, flattennode=flattennode)
else:
for k,v in tree.items():
yield from walktree(v, at, (*prefix, k), flattennode=flattennode)

用途:

> list(walktree({'a':{'b':[0,1],'c':2}, 'd':3}))
[('a', 'b', 0), ('a', 'b', 1), ('a', 'c', 2), ('d', 3)]
> list(walktree({'a':{'b':[0,1],'c':2}, 'd':3}, flattennode=lambda e:False))
[('a', 'b', [0, 1]), ('a', 'c', 2), ('d', 3)]
> list(walktree({'a':{'b':[0,1],'c':2}, 'd':3}, at=1))
[('a', {'b': [0, 1], 'c': 2}), ('d', 3)]
> list(walktree({'a':{'b':[0,{1:9,9:1}],'c':2}, 'd':3}))
[('a', 'b', 0), ('a', 'b', 1, 9), ('a', 'b', 9, 1), ('a', 'c', 2), ('d', 3)]
> list(walktree([1,2,3,[4,5]]))
[(1,), (2,), (3,), (4,), (5,)]

小开

要遍历/映射整个 JSON 结构，可以使用以下代码:

def walk(node, key):
if type(node) is dict:
return {k: walk(v, k) for k, v in node.items()}
elif type(node) is list:
return [walk(x, key) for x in node]
else:
return YourFunction(node, key)


def YourFunction(node, key):
if key == "yourTargetField":   # for example, you want to modify yourTargetField
return "Modified Value"
return node # return existing value

这将遍历整个 json 结构，并通过函数运行每个叶(端点键-值对)。函数返回修改后的值。整个 walk 函数将按顺序为您提供一个新的已处理对象。

小开

可接受解决方案的一个稍微改进的版本。如果您正在使用 Python 3(我希望您这样做) ，那么您可以使用 Python 3.3中引入的 yield from语法。另外，isinstance作为第二个参数可以接受 tuple:

def dict_generator(adict, pre=None):
pre = pre[:] if pre else []
if isinstance(adict, dict):
for key, value in adict.items():
if isinstance(value, dict):
yield from dict_generator(value, pre + [key])
elif isinstance(value, (list, tuple)):
for v in value:
yield from dict_generator(v, pre + [key])
else:
yield pre + [key, value]
else:
yield pre + [adict]

迭代版本(如果您使用生成器，那么它们非常相似) :

def flatten(adict):
stack = [[adict, []]]
while stack:
d, pre = stack.pop()
if isinstance(d, dict):
for key, value in d.items():
if isinstance(value, dict):
stack.append([value, pre + [key]])
elif isinstance(value, (list, tuple)):
for v in value:
stack.append([v, pre + [key]])
else:
print(pre + [key, value])
else:
print(pre + [d])