如何在Python中解析YAML文件

如何在Python中解析YAML文件?

1100789 次浏览

不依赖C头文件的最简单、最纯粹的方法是PyYaml(留档),可以通过pip install pyyaml安装:

#!/usr/bin/env python
import yaml
with open("example.yaml", "r") as stream:try:print(yaml.safe_load(stream))except yaml.YAMLError as exc:print(exc)

仅此而已。也存在普通的yaml.load()函数,但应始终首选yaml.safe_load()以避免引入任意代码执行的可能性。因此,除非您明确需要使用safe_load进行任意对象序列化/反序列化。

请注意,PyYaml项目支持YAML 1.1规范之前的版本。如果需要YAML1.2规范支持,请参阅这个答案中的ruamel.yaml

此外,您还可以使用一个下拉替换pyyaml,这将使您的yaml文件排序为就像你拥有它一样,称为oyaml。查看Oyaml在这里同步

如果您有符合YAML1.2规范(2009年发布)的YAML,那么您应该使用ruamel.yaml(免责声明:我是该软件包的作者)。它本质上是PyYAML的超集,它支持大部分YAML 1.1(从2005年开始)。

如果您希望能够在往返时保留您的注释,您当然应该使用ruamel.yaml.

升级@Jon的示例很容易:

import ruamel.yaml as yaml
with open("example.yaml") as stream:try:print(yaml.safe_load(stream))except yaml.YAMLError as exc:print(exc)

使用safe_load(),除非你真的可以完全控制输入,需要它(很少)并且知道你在做什么。

如果您正在使用PathlibPath来操作文件,那么最好使用ruamel.yaml提供的新API:

from ruamel.yaml import YAMLfrom pathlib import Path
path = Path('example.yaml')yaml = YAML(typ='safe')data = yaml.load(path)

使用Python 2+3(和Unicode)读写YAML文件

# -*- coding: utf-8 -*-import yamlimport io
# Define datadata = {'a list': [1,42,3.141,1337,'help',u'€'],'a string': 'bla','another dict': {'foo': 'bar','key': 'value','the answer': 42}}
# Write YAML filewith io.open('data.yaml', 'w', encoding='utf8') as outfile:yaml.dump(data, outfile, default_flow_style=False, allow_unicode=True)
# Read YAML filewith open("data.yaml", 'r') as stream:data_loaded = yaml.safe_load(stream)
print(data == data_loaded)

创建YAML文件

a list:- 1- 42- 3.141- 1337- help- €a string: blaanother dict:foo: barkey: valuethe answer: 42

常见文件结尾

.yml.yaml

替代品

对于您的应用程序,以下内容可能很重要:

  • 支持其他编程语言
  • 读/写性能
  • 紧凑性(文件大小)

另见:数据序列化格式的比较

如果您正在寻找一种创建配置文件的方法,您可能想阅读我的短文Python中的配置文件

#!/usr/bin/env python
import sysimport yaml
def main(argv):
with open(argv[0]) as stream:try:#print(yaml.load(stream))return 0except yaml.YAMLError as exc:print(exc)return 1
if __name__ == "__main__":sys.exit(main(sys.argv[1:]))

首先使用pip3安装pyyaml。

然后导入yaml模块并将文件加载到名为my_dict的字典中:

import yamlwith open('filename.yaml') as f:my_dict = yaml.safe_load(f)

这就是你所需要的。现在整个yaml文件都在my_dict字典中。

我使用ruamel.yaml.详细信息和辩论在这里

from ruamel import yaml
with open(filename, 'r') as fp:read_data = yaml.load(fp)

ruamel.yaml的用法与PyYAML的旧用法兼容(有一些简单的可解决问题),正如我提供的链接中所述,使用

from ruamel import yaml

而不是

import yaml

它会解决你的大部分问题。

编辑:PyYAML并没有死,它只是在不同的地方维护。

示例:


defaults.yaml

url: https://www.google.com

environment.py

from ruamel import yaml
data = yaml.safe_load(open('defaults.yaml'))data['url']

要访问YAML文件中列表的任何元素,如下所示:

global:registry:url: dtr-:5000/repoPath:dbConnectionString: jdbc:oracle:thin:@x.x.x.x:1521:abcd

您可以使用以下python脚本:

import yaml
with open("/some/path/to/yaml.file", 'r') as f:valuesYaml = yaml.load(f, Loader=yaml.FullLoader)
print(valuesYaml['global']['dbConnectionString'])

read_yaml_file函数将所有数据返回到字典中。

def read_yaml_file(full_path=None, relative_path=None):if relative_path is not None:resource_file_location_local = ProjectPaths.get_project_root_path() + relative_pathelse:resource_file_location_local = full_path
with open(resource_file_location_local, 'r') as stream:try:file_artifacts = yaml.safe_load(stream)except yaml.YAMLError as exc:print(exc)return dict(file_artifacts.items())

建议:使用yq(可通过pip获得)

我不确定为什么以前没有建议过,但我会强烈建议使用yq,这是YAML的jq包装器。

yq使用类似jq的语法,但适用于yaml文件以及json。


示例:

1)读取一个值:

yq e '.a.b[0].c' file.yaml

2)来自STDIN的管道:

cat file.yaml | yq e '.a.b[0].c' -

3)更新一个yaml文件,就地

yq e -i '.a.b[0].c = "cool"' file.yaml

4)使用环境变量更新:

NAME=mike yq e -i '.a.b[0].c = strenv(NAME)' file.yaml

5)合并多个文件:

yq ea '. as $item ireduce ({}; . * $item )' path/to/*.yml

6)对yaml文件的多次更新:

yq e -i '.a.b[0].c = "cool" |.x.y.z = "foobar" |.person.name = strenv(NAME)' file.yaml

(*)阅读有关如何基于jq过滤器从yaml解析字段的更多信息。


其他参考资料:

https://github.com/mikefarah/yq/#install

https://github.com/kislyuk/yq

我为此制作了自己的脚本。只要你保留归因,请随意使用它。该脚本可以从文件中解析yaml(函数load),从字符串中解析yaml(函数loads)并将字典转换为yaml(函数dumps)。它尊重所有变量类型。

# © didlly AGPL-3.0 License - github.com/didlly
def is_float(string: str) -> bool:try:float(string)return Trueexcept ValueError:return False

def is_integer(string: str) -> bool:try:int(string)return Trueexcept ValueError:return False

def load(path: str) -> dict:with open(path, "r") as yaml:levels = []data = {}indentation_str = ""
for line in yaml.readlines():if line.replace(line.lstrip(), "") != "" and indentation_str == "":indentation_str = line.replace(line.lstrip(), "").rstrip("\n")if line.strip() == "":continueelif line.rstrip()[-1] == ":":key = line.strip()[:-1]quoteless = (is_float(key)or is_integer(key)or key == "True"or key == "False"or ("[" in key and "]" in key))
if len(line.replace(line.strip(), "")) // 2 < len(levels):if quoteless:levels[len(line.replace(line.strip(), "")) // 2] = f"[{key}]"else:levels[len(line.replace(line.strip(), "")) // 2] = f"['{key}']"else:if quoteless:levels.append(f"[{line.strip()[:-1]}]")else:levels.append(f"['{line.strip()[:-1]}']")if quoteless:exec(f"data{''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}]"+ " = {}")else:exec(f"data{''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}']"+ " = {}")
continue
key = line.split(":")[0].strip()value = ":".join(line.split(":")[1:]).strip()
if (is_float(value)or is_integer(value)or value == "True"or value == "False"or ("[" in value and "]" in value)):if (is_float(key)or is_integer(key)or key == "True"or key == "False"or ("[" in key and "]" in key)):exec(f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}] = {value}")else:exec(f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}'] = {value}")else:if (is_float(key)or is_integer(key)or key == "True"or key == "False"or ("[" in key and "]" in key)):exec(f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}] = '{value}'")else:exec(f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}'] = '{value}'")return data

def loads(yaml: str) -> dict:levels = []data = {}indentation_str = ""
for line in yaml.split("\n"):if line.replace(line.lstrip(), "") != "" and indentation_str == "":indentation_str = line.replace(line.lstrip(), "")if line.strip() == "":continueelif line.rstrip()[-1] == ":":key = line.strip()[:-1]quoteless = (is_float(key)or is_integer(key)or key == "True"or key == "False"or ("[" in key and "]" in key))
if len(line.replace(line.strip(), "")) // 2 < len(levels):if quoteless:levels[len(line.replace(line.strip(), "")) // 2] = f"[{key}]"else:levels[len(line.replace(line.strip(), "")) // 2] = f"['{key}']"else:if quoteless:levels.append(f"[{line.strip()[:-1]}]")else:levels.append(f"['{line.strip()[:-1]}']")if quoteless:exec(f"data{''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}]"+ " = {}")else:exec(f"data{''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}']"+ " = {}")
continue
key = line.split(":")[0].strip()value = ":".join(line.split(":")[1:]).strip()
if (is_float(value)or is_integer(value)or value == "True"or value == "False"or ("[" in value and "]" in value)):if (is_float(key)or is_integer(key)or key == "True"or key == "False"or ("[" in key and "]" in key)):exec(f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}] = {value}")else:exec(f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}'] = {value}")else:if (is_float(key)or is_integer(key)or key == "True"or key == "False"or ("[" in key and "]" in key)):exec(f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}] = '{value}'")else:exec(f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}'] = '{value}'")
return data

def dumps(yaml: dict, indent="") -> str:"""A procedure which converts the dictionary passed to the procedure into it's yaml equivalent.
Args:yaml (dict): The dictionary to be converted.
Returns:data (str): The dictionary in yaml form."""
data = ""
for key in yaml.keys():if type(yaml[key]) == dict:data += f"\n{indent}{key}:\n"data += dumps(yaml[key], f"{indent}  ")else:data += f"{indent}{key}: {yaml[key]}\n"
return data

print(load("config.yml"))

示例

config.yml

level 0 value: 0
level 1:level 1 value: 1level 2:level 2 value: 2
level 1 2:level 1 2 value: 1 2level 2 2:level 2 2 value: 2 2

产出

{'level 0 value': 0, 'level 1': {'level 1 value': 1, 'level 2': {'level 2 value': 2}}, 'level 1 2': {'level 1 2 value': '1 2', 'level 2 2': {'level 2 2 value': 2 2}}}