Python 序列化——为什么是 pickle?

我理解 Python pickling 是一种以尊重对象编程的方式来“存储”Python 对象的方法——不同于用 txt 文件或 DB 编写的输出。

你是否有以下各点的详细资料或参考资料:

  • 腌渍过的东西‘储存’在哪里?
  • 为什么 pickle 保留对象表示比存储在 DB 中更多?
  • 我可以从一个 Python shell 会话检索 pickle 对象到另一个吗?
  • 当序列化有用的时候,你有重要的例子吗?
  • 用 pickle 进行序列化是否意味着数据“压缩”?

换句话说,我正在寻找一个关于 pickle 的文档-Python. doc 解释了如何实现 pickle,但似乎没有深入到关于使用和序列化必要性的细节。

54801 次浏览

Pickling is a way to convert a python object (list, dict, etc.) into a character stream. The idea is that this character stream contains all the information necessary to reconstruct the object in another python script.

As for where the pickled information is stored, usually one would do:

with open('filename', 'wb') as f:
var = {1 : 'a' , 2 : 'b'}
pickle.dump(var, f)

That would store the pickled version of our var dict in the 'filename' file. Then, in another script, you could load from this file into a variable and the dictionary would be recreated:

with open('filename','rb') as f:
var = pickle.load(f)

Another use for pickling is if you need to transmit this dictionary over a network (perhaps with sockets or something.) You first need to convert it into a character stream, then you can send it over a socket connection.

Also, there is no "compression" to speak of here...it's just a way to convert from one representation (in RAM) to another (in "text").

About.com has a nice introduction of pickling here.

Pickling is absolutely necessary for distributed and parallel computing.

Say you wanted to do a parallel map-reduce with multiprocessing (or across cluster nodes with pyina), then you need to make sure the function you want to have mapped across the parallel resources will pickle. If it doesn't pickle, you can't send it to the other resources on another process, computer, etc. Also see here for a good example.

To do this, I use dill, which can serialize almost anything in python. Dill also has some good tools for helping you understand what is causing your pickling to fail when your code fails.

And, yes, people use picking to save the state of a calculation, or your ipython session, or whatever. You can also extend pickle's Pickler and UnPickler to do compression with bz2 or gzip if you'd like.

it is kind of serialization. use cPickle it is much faster than pickle.

import pickle
##make Pickle File
with open('pickles/corups.pickle', 'wb') as handle:
pickle.dump(corpus, handle)


#read pickle file
with open('pickles/corups.pickle', 'rb') as handle:
corpus = pickle.load(handle)

I find it to be particularly useful with large and complex custom classes. In a particular example I'm thinking of, "Gathering" the information (from a database) to create the class was already half the battle. Then that information stored in the class might be altered at runtime by the user.

You could have another group of tables in the database and write another function to go through everything stored and write it to the new database tables. Then you would need to write another function to be able to load something saved by reading all of that info back in.

Alternatively, you could pickle the whole class as is and then store that to a single field in the database. Then when you go to load it back, it will all load back in at once as it was before. This can end up saving a lot of time and code when saving and retrieving complicated classes.