在更改模块的目录后 Python pickle

小开

This is the normal behavior of pickle, unpickled objects need to have their defining module importable.

You should be able to change the modules path (i.e. from tools to whyteboard.tools) by editing the pickled files, as they are normally simple text files.

小开

最佳答案

As pickle's docs say, in order to save and restore a class instance (actually a function, too), you must respect certain constraints:

pickle can save and restore class instances transparently, however the class definition must be importable and live in the same module as when the object was stored

whyteboard.tools is not the "the same module as" tools (even though it can be imported by import tools by other modules in the same package, it ends up in sys.modules as sys.modules['whyteboard.tools']: this is absolutely crucial, otherwise the same module imported by one in the same package vs one in another package would end up with multiple and possibly conflicting entries!).

If your pickle files are in a good/advanced format (as opposed to the old ascii format that's the default only for compatibility reasons), migrating them once you perform such changes may in fact not be quite as trivial as "editing the file" (which is binary &c...!), despite what another answer suggests. I suggest that, instead, you make a little "pickle-migrating script": let it patch sys.modules like this...:

import sys
from whyteboard import tools


sys.modules['tools'] = tools

and then cPickle.load each file, del sys.modules['tools'], and cPickle.dump each loaded object back to file: that temporary extra entry in sys.modules should let the pickles load successfully, then dumping them again should be using the right module-name for the instances' classes (removing that extra entry should make sure of that).

小开

pickle serializes classes by reference, so if you change were the class lives, it will not unpickle because the class will not be found. If you use dill instead of pickle, then you can serialize classes by reference or directly (by directly serializing the class instead of it's import path). You simulate this pretty easily by just changing the class definition after a dump and before a load.

Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>>
>>> class Foo(object):
...   def bar(self):
...     return 5
...
>>> f = Foo()
>>>
>>> _f = dill.dumps(f)
>>>
>>> class Foo(object):
...   def bar(self, x):
...     return x
...
>>> g = Foo()
>>> f_ = dill.loads(_f)
>>> f_.bar()
5
>>> g.bar(4)
4

小开

Happened to me, solved it by adding the new location of the module to sys.path before loading pickle:

import sys
sys.path.append('path/to/whiteboard')
f = open("pickled_file", "rb")
pickle.load(f)

小开

This can be done with a custom "unpickler" that uses find_class():

import io
import pickle




class RenameUnpickler(pickle.Unpickler):
def find_class(self, module, name):
renamed_module = module
if module == "tools":
renamed_module = "whyteboard.tools"


return super(RenameUnpickler, self).find_class(renamed_module, name)




def renamed_load(file_obj):
return RenameUnpickler(file_obj).load()




def renamed_loads(pickled_bytes):
file_obj = io.BytesIO(pickled_bytes)
return renamed_load(file_obj)

Then you'd need to use renamed_load() instead of pickle.load() and renamed_loads() instead of pickle.loads().

小开

When you try to load a pickle file that contain a class reference, you must respect the same structure when you saved the pickle. If you want use the pickle somewhere else, you have to tell where this class or other object is; so do this below you can save the day:

import sys
sys.path.append('path/to/folder containing the python module')

小开

For people like me needing to update lots of pickle dumps, here's a function implementing @Alex Martelli's excellent advice:

import sys
from types import ModuleType
import pickle


# import torch


def update_module_path_in_pickled_object(
pickle_path: str, old_module_path: str, new_module: ModuleType
) -> None:
"""Update a python module's dotted path in a pickle dump if the
corresponding file was renamed.


Implements the advice in https://stackoverflow.com/a/2121918.


Args:
pickle_path (str): Path to the pickled object.
old_module_path (str): The old.dotted.path.to.renamed.module.
new_module (ModuleType): from new.location import module.
"""
sys.modules[old_module_path] = new_module


dic = pickle.load(open(pickle_path, "rb"))
# dic = torch.load(pickle_path, map_location="cpu")


del sys.modules[old_module_path]


pickle.dump(dic, open(pickle_path, "wb"))
# torch.save(dic, pickle_path)

In my case, the dumps were PyTorch model checkpoints. Hence the commented-out torch.load/save().

Example

from new.location import new_module


for pickle_path in ('foo.pkl', 'bar.pkl'):
update_module_path_in_pickled_object(
pickle_path, "old.module.dotted.path", new_module
)