在更改模块的目录后 Python pickle

我最近更改了程序的目录布局: 以前,我将所有模块放在“ main”文件夹中。现在,我已经将它们移动到一个以程序命名的目录中,并在那里放置了一个 __init__.py来生成一个包。

现在我有单曲了。我的主目录中的 py 文件,用于启动我的程序,这样简单得多。

无论如何,试图从我的程序的早期版本加载 pickle 文件是失败的。我得到的是“ Import Error: No module name tools”——我猜这是因为我的模块以前在 main 文件夹中,现在它在 whyteboard.tools 中,而不仅仅是普通的工具。但是,工具模块中导入的代码与它在同一个目录中,所以我怀疑是否需要指定一个包。

我的程序目录是这样的:

whyteboard-0.39.4

-->whyteboard.py

-->README.txt

-->CHANGELOG.txt

---->whyteboard/

---->whyteboard/__init__.py

---->whyteboard/gui.py

---->whyteboard/tools.py

Py 从 whyteboard/GUI.py 启动一个代码块,启动 GUI。在目录重新组织之前,这个 pickle 问题肯定不会发生。

55602 次浏览

This is the normal behavior of pickle, unpickled objects need to have their defining module importable.

You should be able to change the modules path (i.e. from tools to whyteboard.tools) by editing the pickled files, as they are normally simple text files.

As pickle's docs say, in order to save and restore a class instance (actually a function, too), you must respect certain constraints:

pickle can save and restore class instances transparently, however the class definition must be importable and live in the same module as when the object was stored

whyteboard.tools is not the "the same module as" tools (even though it can be imported by import tools by other modules in the same package, it ends up in sys.modules as sys.modules['whyteboard.tools']: this is absolutely crucial, otherwise the same module imported by one in the same package vs one in another package would end up with multiple and possibly conflicting entries!).

If your pickle files are in a good/advanced format (as opposed to the old ascii format that's the default only for compatibility reasons), migrating them once you perform such changes may in fact not be quite as trivial as "editing the file" (which is binary &c...!), despite what another answer suggests. I suggest that, instead, you make a little "pickle-migrating script": let it patch sys.modules like this...:

import sys
from whyteboard import tools


sys.modules['tools'] = tools

and then cPickle.load each file, del sys.modules['tools'], and cPickle.dump each loaded object back to file: that temporary extra entry in sys.modules should let the pickles load successfully, then dumping them again should be using the right module-name for the instances' classes (removing that extra entry should make sure of that).

pickle serializes classes by reference, so if you change were the class lives, it will not unpickle because the class will not be found. If you use dill instead of pickle, then you can serialize classes by reference or directly (by directly serializing the class instead of it's import path). You simulate this pretty easily by just changing the class definition after a dump and before a load.

Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>>
>>> class Foo(object):
...   def bar(self):
...     return 5
...
>>> f = Foo()
>>>
>>> _f = dill.dumps(f)
>>>
>>> class Foo(object):
...   def bar(self, x):
...     return x
...
>>> g = Foo()
>>> f_ = dill.loads(_f)
>>> f_.bar()
5
>>> g.bar(4)
4

Happened to me, solved it by adding the new location of the module to sys.path before loading pickle:

import sys
sys.path.append('path/to/whiteboard')
f = open("pickled_file", "rb")
pickle.load(f)

This can be done with a custom "unpickler" that uses find_class():

import io
import pickle




class RenameUnpickler(pickle.Unpickler):
def find_class(self, module, name):
renamed_module = module
if module == "tools":
renamed_module = "whyteboard.tools"


return super(RenameUnpickler, self).find_class(renamed_module, name)




def renamed_load(file_obj):
return RenameUnpickler(file_obj).load()




def renamed_loads(pickled_bytes):
file_obj = io.BytesIO(pickled_bytes)
return renamed_load(file_obj)

Then you'd need to use renamed_load() instead of pickle.load() and renamed_loads() instead of pickle.loads().

When you try to load a pickle file that contain a class reference, you must respect the same structure when you saved the pickle. If you want use the pickle somewhere else, you have to tell where this class or other object is; so do this below you can save the day:

import sys
sys.path.append('path/to/folder containing the python module')

For people like me needing to update lots of pickle dumps, here's a function implementing @Alex Martelli's excellent advice:

import sys
from types import ModuleType
import pickle


# import torch


def update_module_path_in_pickled_object(
pickle_path: str, old_module_path: str, new_module: ModuleType
) -> None:
"""Update a python module's dotted path in a pickle dump if the
corresponding file was renamed.


Implements the advice in https://stackoverflow.com/a/2121918.


Args:
pickle_path (str): Path to the pickled object.
old_module_path (str): The old.dotted.path.to.renamed.module.
new_module (ModuleType): from new.location import module.
"""
sys.modules[old_module_path] = new_module


dic = pickle.load(open(pickle_path, "rb"))
# dic = torch.load(pickle_path, map_location="cpu")


del sys.modules[old_module_path]


pickle.dump(dic, open(pickle_path, "wb"))
# torch.save(dic, pickle_path)

In my case, the dumps were PyTorch model checkpoints. Hence the commented-out torch.load/save().

Example

from new.location import new_module


for pickle_path in ('foo.pkl', 'bar.pkl'):
update_module_path_in_pickled_object(
pickle_path, "old.module.dotted.path", new_module
)