在 python 中读取 v7.3 mat 文件

我试图用下面的代码读取一个 matlab 文件

import scipy.io
mat = scipy.io.loadmat('test.mat')

它给了我下面的错误

raise NotImplementedError('Please use HDF reader for matlab v7.3 files')
NotImplementedError: Please use HDF reader for matlab v7.3 files

所以任何人都可以有同样的问题,可以请任何示例代码

谢谢

108845 次浏览

Try using h5py module

import h5py
with h5py.File('test.mat', 'r') as f:
f.keys()

According to the Scipy cookbook. http://wiki.scipy.org/Cookbook/Reading_mat_files,

Beginning at release 7.3 of Matlab, mat files are actually saved using the HDF5 format by default (except if you use the -vX flag at save time, see help save in Matlab). These files can be read in Python using, for instance, the PyTables or h5py package. Reading Matlab structures in mat files does not seem supported at this point.

Perhaps you could use Octave to re-save using the -vX flag.

import h5py
import numpy as np
filepath = '/path/to/data.mat'
arrays = {}
f = h5py.File(filepath)
for k, v in f.items():
arrays[k] = np.array(v)

you should end up with your data in the arrays dict, unless you have MATLAB structures, I suspect. Hope it helps!

I had a look at this issue: https://github.com/h5py/h5py/issues/726. If you saved your mat file with -v7.3 option, you should generate the list of keys with (under Python 3.x):

import h5py
with h5py.File('test.mat', 'r') as file:
print(list(file.keys()))

In order to access the variable a for instance, you have to use the same trick:

with h5py.File('test.mat', 'r') as file:
a = list(file['a'])

Per Magu_'s answer on a related thread, check out the package hdf5storage which has convenience functions to read v7.3 matlab mat files; it is as simple as

import hdf5storage
mat = hdf5storage.loadmat('test.mat')

Despite hours of searching I've not found how to access Matlab v7.3 structures either. Hopefully this partial answer will help someone, and I'd be very happy to see extra pointers.

So starting with (I think the [0][0] arises from Matlab giving everything to dimensions):

f = h5py.File('filename', 'r')
f['varname'][0][0]

gives: < HDF5 object reference >

Pass this reference to f again:

f[f['varname'][0][0]]

which gives an array: convert this to a numpy array and extract the value (or, recursively, another < HDF5 object reference > :

np.array(f[f['varname'][0][0]])[0][0]

If accessing the disk is slow, maybe loading to memory would help.


Further edit: after much futile searching my final workaround (I really hope someone else has a better solution!) was calling Matlab from python which is pretty easy and fast:

eng = matlab.engine.start_matlab()  # first fire up a Matlab instance
eng.quit()
eng = matlab.engine.connect_matlab()  # or connect to an existing one
eng.sqrt(4.0)
x = 4.0
eng.workspace['y'] = x
a = eng.eval('sqrt(y)')
print(a)
x = eng.eval('parameterised_function_in_Matlab(1, 1)', nargout=1)
a = eng.eval('Structured_variable{1}{2}.object_name')  # (nested cell, cell, object)

This function reads Matlab-produced HDF5 .mat files, and returns a structure of nested dicts of Numpy arrays. Matlab writes matrices in Fortran order, so this also transposes matrices and higher-dimensional arrays into conventional Numpy order arr[..., page, row, col].

import h5py


def read_matlab(filename):
def conv(path=''):
p = path or '/'
paths[p] = ret = {}
for k, v in f[p].items():
if type(v).__name__ == 'Group':
ret[k] = conv(f'{path}/{k}')  # Nested struct
continue
v = v[()]  # It's a Numpy array now
if v.dtype == 'object':
# HDF5ObjectReferences are converted into a list of actual pointers
ret[k] = [r and paths.get(f[r].name, f[r].name) for r in v.flat]
else:
# Matrices and other numeric arrays
ret[k] = v if v.ndim < 2 else v.swapaxes(-1, -2)
return ret


paths = {}
with h5py.File(filename, 'r') as f:
return conv()

I've created a small library to load MATLAB 7.3 files:

pip install mat73

To load a .mat 7.3 into Python as a dictionary:

import mat73
data_dict = mat73.loadmat('data.mat')

simple as that!

If you are only reading in basic arrays and structs, see vikrantt's answer on a similar post. However, if you are working with a Matlab table, then IMHO the best solution is to avoid the save option altogether.

I've created a simple helper function to convert a Matlab table to a standard hdf5 file, and another helper function in Python to extract the data into a Pandas DataFrame.

Matlab Helper Function

function table_to_hdf5(T, path, group)
%TABLE_TO_HDF5 Save a Matlab table in an hdf5 file format
%
%    TABLE_TO_HDF5(T) Saves the table T to the HDF5 file inputname.h5 at the root ('/')
%    group, where inputname is the name of the input argument for T
%
%    TABLE_TO_HDF5(T, path) Saves the table T to the HDF5 file specified by path at the
%    root ('/') group.
%
%    TABLE_TO_HDF5(T, path, group) Saves the table T to the HDF5 file specified by path
%    at the group specified by group.
%
%%%


if nargin < 2
path = [inputname(1),'.h5'];  % default file name to input argument
end
if nargin < 3
group = '';  % We will prepend '/' later, so this is effectively root
end


for field = T.Properties.VariableNames
% Prepare to write
field = field{:};
dataset_name = [group '/' field];
data = T.(field);
if ischar(data) || isstring(data)
warning('String columns not supported. Skipping...')
continue
end
% Write the data
h5create(path, dataset_name, size(data))
h5write(path, dataset_name, data)
end


end

Python Helper Function

import pandas as pd
import h5py




def h5_to_df(path, group = '/'):
"""
Load an hdf5 file into a pandas DataFrame
"""
df = pd.DataFrame()
with h5py.File(path, 'r') as f:
data = f[group]
for k,v in data.items():
if v.shape[0] > 1:  # Multiple column field
for i in range(v.shape[0]):
k_new = f'{k}_{i}'
df[k_new] = v[i]
else:
df[k] = v[0]
return df

Important Notes

  • This will only work on numerical data. If you know how to add string data, please comment.
  • This will create the file if it does not already exist.
  • This will crash if the data already exists in the file. You'll want to include logic to handle those cases as you deem appropriate.