编程错误: 除非使用可以解释8位字节串的 text_Factory,否则不能使用8位字节串

在 Python 中使用 SQLite3,我试图存储 UTF-8 HTML 代码片段的压缩版本。

代码如下:

...
c = connection.cursor()
c.execute('create table blah (cid integer primary key,html blob)')
...
c.execute('insert or ignore into blah values (?, ?)',(cid, zlib.compress(html)))

在这一点上得到了错误:

sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

如果我使用“ text”而不是“ blob”,并且不压缩 HTML 代码片段,那么它完全可以正常工作(但是 db 太大了)。当我使用‘ blob’并通过 Python zlib 库进行压缩时,会得到上面的错误消息。我环顾四周,但找不到这个问题的简单答案。

64831 次浏览

Found the solution, I should have spent just a little more time searching.

Solution is to 'cast' the value as a Python 'buffer', like so:

c.execute('insert or ignore into blah values (?, ?)',(cid, buffer(zlib.compress(html))))

Hopefully this will help somebody else.

If you want to use 8-bit strings instead of unicode string in sqlite3, set appropriate text_factory for sqlite connection:

connection = sqlite3.connect(...)
connection.text_factory = str

You could store the value using repr(html) instead of the raw output and then use eval(html) when retrieving the value for use.

c.execute('insert or ignore into blah values (?, ?)',(1, repr(zlib.compress(html))))

In order to work with the BLOB type, you must first convert your zlib compressed string into binary data - otherwise sqlite will try to process it as a text string. This is done with sqlite3.Binary(). For example:

c.execute('insert or ignore into blah values (?, ?)',(cid,
sqlite3.Binary(zlib.compress(html))))

Syntax:

5 types of possible storage: NULL, INTEGER, TEXT, REAL and BLOB

BLOB is generally used to store pickled models or dill pickled models

> cur.execute('''INSERT INTO Tablename(Col1, Col2, Col3, Col4) VALUES(?,?,?,?)''',
[TextValue, Real_Value, Buffer(model), sqlite3.Binary(model2)])
> conn.commit()


> # Read Data:
> df = pd.read_sql('SELECT * FROM Model, con=conn)
> model1 = str(df['Col3'].values[0]))
> model2 = str(df['Col'].values[0]))