Unicode()和 encode()函数在 Python 中的使用

我在编码路径变量并将其插入到 SQLite数据库时遇到了问题。我试图用 编码(“ utf-8”)函数解决这个问题，但是没有用。然后我使用 unicode()函数，它给我类型 Unicode。

print type(path)                  # <type 'unicode'>
path = path.replace("one", "two") # <type 'str'>
path = path.encode("utf-8")       # <type 'str'> strange
path = unicode(path)              # <type 'unicode'>

最后我获得了 Unicode类型，但是我仍然有相同的错误，当路径变量的类型是 STR时出现了这个错误

编程错误: 不能使用8位字节字符串，除非您可以使用一个 text _ Factory 来解释8位字节串(如我们强烈建议您只使用将应用程序切换为 Unicode 字符串。

你能帮我解决这个错误，并解释 encode("utf-8")和 unicode()函数的正确用法吗？我经常用它来战斗。

execute()的声明提出了一个错误:

cur.execute("update docs set path = :fullFilePath where path = :path", locals())

我忘了更改 fullFilePath变量的编码，这个变量也有同样的问题，但是我现在很困惑。我应该只使用 unicode()还是 encode("utf-8")，还是两者都使用？

我不能用

fullFilePath = unicode(fullFilePath.encode("utf-8"))

因为它提出了这个错误:

UnicodeDecodeError: ‘ ascii’编解码器在位置上无法解码字节0xc5 32: 序数不在范围内(128)

Python version is < em > 2.7.2

313209

小开

str是以字节为单位的文本表示，unicode是以字符为单位的文本表示。

您将文本从字节解码为 unicode，并使用某种编码将 unicode 编码为字节。

那就是:

>>> 'abc'.decode('utf-8')  # str to unicode
u'abc'
>>> u'abc'.encode('utf-8') # unicode to str
'abc'

UPD Sep 2020 : 答案是在 Python 2最常用的时候写的。在 Python 3中，str被重命名为 bytes，而 unicode被重命名为 str。

>>> b'abc'.decode('utf-8') # bytes to str
'abc'
>>> 'abc'.encode('utf-8'). # str to bytes
b'abc'

小开

最佳答案

您正在错误地使用 encode("utf-8")。Python 字节字符串(str类型)有编码，Unicode 没有。可以使用 uni.encode(encoding)将 Unicode 字符串转换为 Python 字节字符串，也可以使用 s.decode(encoding)(或相当于 unicode(s, encoding))将字节字符串转换为 Unicode 字符串。

如果 fullFilePath和 path目前是 str类型，您应该弄清楚它们是如何编码的。例如，如果当前的编码是 utf-8，您将使用:

path = path.decode('utf-8')
fullFilePath = fullFilePath.decode('utf-8')

如果这不能解决问题，实际的问题可能是您在 execute()调用中没有使用 Unicode 字符串，尝试将其更改为以下内容:

cur.execute(u"update docs set path = :fullFilePath where path = :path", locals())

小开

在从 shell 运行脚本之前，确保您已经设置了您的区域设置，例如。

$ locale -a | grep "^en_.\+UTF-8"
en_GB.UTF-8
en_US.UTF-8
$ export LC_ALL=en_GB.UTF-8
$ export LANG=en_GB.UTF-8

医生: man locale，man setlocale。