Python pickle error: UnicodeDecodeError

I'm trying to do some text classification using Textblob. I'm first training the model and serializing it using pickle as shown below.

import pickle
from textblob.classifiers import NaiveBayesClassifier


with open('sample.csv', 'r') as fp:
cl = NaiveBayesClassifier(fp, format="csv")


f = open('sample_classifier.pickle', 'wb')
pickle.dump(cl, f)
f.close()

And when I try to run this file:

import pickle
f = open('sample_classifier.pickle', encoding="utf8")
cl = pickle.load(f)
f.close()

I get this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Following are the content of my sample.csv:

My SQL is not working correctly at all. This was a wrong choice, SQL

I've issues. Please respond immediately, Support

Where am I going wrong here? Please help.

94109 次浏览

小开

最佳答案

通过选择 open模式下的文件 wb，您可以选择用原始二进制文件进行写入。没有任何字符编码。

因此，要读取这个文件，只需在模式 rb中使用 open即可。

小开

I think you should open the file as

f = open('sample_classifier.pickle', 'rb')
cl = pickle.load(f)

你不需要解码。无论你保存的是什么，pickle.load都会给你一个精确的拷贝。此时，您应该能够使用 cl，就好像您刚刚创建了它一样。

小开

也许文件是用 Latin1编码的:

f = open('sample_classifier.pickle', encoding="latin1")

小开

由于所有建议的答案都没有帮助我解决这个错误——我转而选择了 joblib:

import joblib
clf_loaded = joblib.load('classifier_file_name.joblib')

效果很好！

小开

试试这个代码它的工作:

 with open('your picle file name', 'rb') as f:
classifier = pickle.load(f, encoding="latin1")

注意: 如果不固定，你可以尝试改变(编码)类型(“ utf-8”) ，如果你使用 python2，但如果你使用 python3.x 编码将是默认的(“ utf-8”) ..。