如何在 Python 中读取 HDF5文件

我正在尝试用 Python 从 hdf5文件中读取数据。我可以使用 h5py读取 hdf5文件,但是我不知道如何访问文件中的数据。

我的原则

import h5py
import numpy as np
f1 = h5py.File(file_name,'r+')

这可以工作,文件被读取。但是我如何访问文件对象 f1中的数据呢?

378396 次浏览

you can use Pandas.

import pandas as pd
pd.read_hdf(filename,key)

您需要做的是创建一个数据集。如果您看一下快速入门指南,它会告诉您需要使用 file 对象来创建数据集。所以,f.create_dataset,然后你可以读取数据。这在 医生中有解释。

Read HDF5

import h5py
filename = "file.hdf5"


with h5py.File(filename, "r") as f:
# Print all root level object names (aka keys)
# these can be group or dataset names
print("Keys: %s" % f.keys())
# get first object name/key; may or may NOT be a group
a_group_key = list(f.keys())[0]


# get the object type for a_group_key: usually group or dataset
print(type(f[a_group_key]))


# If a_group_key is a group name,
# this gets the object names in the group and returns as a list
data = list(f[a_group_key])


# If a_group_key is a dataset name,
# this gets the dataset values and returns as a list
data = list(f[a_group_key])
# preferred methods to get dataset values:
ds_obj = f[a_group_key]      # returns as a h5py dataset object
ds_arr = f[a_group_key][()]  # returns as a numpy array

写 HDF5

import h5py


# Create random data
import numpy as np
data_matrix = np.random.uniform(-1, 1, size=(10, 3))


# Write data to HDF5
with h5py.File("file.hdf5", "w") as data_file:
data_file.create_dataset("dataset_name", data=data_matrix)

有关更多信息,请参见 H5py 文件

替代品

对于您的应用程序,以下内容可能很重要:

  • 其他编程语言的支持
  • 阅读/写作表现
  • Compactness (file size)

参见: 数据序列化格式比较

如果您正在寻找创建配置文件的方法,那么您可能想阅读我的短文 Python 中的配置文件

Reading the file

import h5py


f = h5py.File(file_name, mode)

通过打印 HDF5组的内容来研究文件的结构

for key in f.keys():
print(key) #Names of the root level object names in HDF5 file - can be groups or datasets.
print(type(f[key])) # get the object type: usually group or dataset

正在提取数据

#Get the HDF5 group; key needs to be a group name from above
group = f[key]


#Checkout what keys are inside that group.
for key in group.keys():
print(key)


# This assumes group[some_key_inside_the_group] is a dataset,
# and returns a np.array:
data = group[some_key_inside_the_group][()]
#Do whatever you want with data


#After you are done
f.close()

使用下面的代码读取数据并转换成数字数组

import h5py
f1 = h5py.File('data_1.h5', 'r')
list(f1.keys())
X1 = f1['x']
y1=f1['y']
df1= np.array(X1.value)
dfy1= np.array(y1.value)
print (df1.shape)
print (dfy1.shape)

Preferred method to read dataset values into a numpy array:

import h5py
# use Python file context manager:
with h5py.File('data_1.h5', 'r') as f1:
print(list(f1.keys()))  # print list of root level objects
# following assumes 'x' and 'y' are dataset objects
ds_x1 = f1['x']  # returns h5py dataset object for 'x'
ds_y1 = f1['y']  # returns h5py dataset object for 'y'
arr_x1 = f1['x'][()]  # returns np.array for 'x'
arr_y1 = f1['y'][()]  # returns np.array for 'y'
arr_x1 = ds_x1[()]  # uses dataset object to get np.array for 'x'
arr_y1 = ds_y1[()]  # uses dataset object to get np.array for 'y'
print (arr_x1.shape)
print (arr_y1.shape)

要以数组形式读取. hdf5文件的内容,可以执行以下操作

> import numpy as np
> myarray = np.fromfile('file.hdf5', dtype=float)
> print(myarray)

Here's a simple function I just wrote which reads a .hdf5 file generated by the save_weights function in keras and returns a dict with layer names and weights:

def read_hdf5(path):


weights = {}


keys = []
with h5py.File(path, 'r') as f: # open file
f.visit(keys.append) # append all keys to list
for key in keys:
if ':' in key: # contains data if ':' in key
print(f[key].name)
weights[f[key].name] = f[key].value
return weights

Https://gist.github.com/attila94/fb917e03b04035f3737cc8860d9e9f9b.

虽然还没有彻底测试,但已经为我完成了工作。

from keras.models import load_model


h= load_model('FILE_NAME.h5')

使用这个问题的一些答案和最新的 医生,我能够使用

import h5py
with h5py.File(filename, 'r') as h5f:
h5x = h5f[list(h5f.keys())[0]]['x'][()]

在我的例子中,'x'仅仅是 X 坐标。

如果您已经在 hdf 文件中命名了数据集,那么您可以使用以下代码来读取和转换这些数据集,使用数字数组:

import h5py
file = h5py.File('filename.h5', 'r')


xdata = file.get('xdata')
xdata= np.array(xdata)

如果您的文件位于不同的目录中,您可以在 'filename.h5'前面添加路径。

用这个,对我很有用


weights = {}


keys = []
with h5py.File("path.h5", 'r') as f:
f.visit(keys.append)
for key in keys:
if ':' in key:
print(f[key].name)
weights[f[key].name] = f[key][()]
return weights


print(read_hdf5())

如果您使用的是 h5py < =’2.9.0’ 然后可以使用


weights = {}


keys = []
with h5py.File("path.h5", 'r') as f:
f.visit(keys.append)
for key in keys:
if ':' in key:
print(f[key].name)
weights[f[key].name] = f[key].value
return weights


print(read_hdf5())