UnicodeDecodeError: ‘ ascii’codec 不能解码位于13的字节0xe2: 序数不在范围内(128)

小开

最佳答案

文件被当作一堆 strs 读取，但它应该是 unicodes。Python 尝试隐式转换，但失败了。改变:

job_titles = [line.strip() for line in title_file.readlines()]

显式地将 str解码为 unicode(在这里假设为 UTF-8) :

job_titles = [line.decode('utf-8').strip() for line in title_file.readlines()]

这也可以通过导入 codecs模块和使用 codecs.open来解决，而不是使用内置的 open。

小开

你也可以试试这个:

import sys
reload(sys)
sys.setdefaultencoding('utf8')

小开

您可以在使用 job_titles字符串之前尝试这样做:

source = unicode(job_titles, 'utf-8')

小开

对我来说，终端编码存在一个问题，将 UTF-8添加到. bashrc 解决了这个问题:

export LC_CTYPE=en_US.UTF-8

不要忘记重新装弹:

source ~/.bashrc

小开

这对我来说没问题。

f = open(file_path, 'r+', encoding="utf-8")

可以添加第三个参数编码以确保编码类型为‘ utf-8’

注意: 这个方法在 Python 3中工作得很好，我没有在 Python 2.7中尝试过。

小开

对于 python3，默认编码是“ utf-8”。在基础文档中提出了以下步骤: 如果出现任何问题，则使用 https://docs.python.org/2/library/csv.html#csv-examples

创建一个函数

def utf_8_encoder(unicode_csv_data):
for line in unicode_csv_data:
yield line.encode('utf-8')

Then use the function inside the reader, for e.g.

csv_reader = csv.reader(utf_8_encoder(unicode_csv_data))

小开

要查找任何和所有与 Unicode 相关的错误，请使用以下命令:

grep -r -P '[^\x00-\x7f]' /etc/apache2 /etc/letsencrypt /etc/nginx

我的在里面

/etc/letsencrypt/options-ssl-nginx.conf:        # The following CSP directives don't use default-src as

使用 shed，我发现了违规的序列。结果是一个编辑错误。

00008099:     C2  194 302 11000010
00008100:     A0  160 240 10100000
00008101:  d  64  100 144 01100100
00008102:  e  65  101 145 01100101
00008103:  f  66  102 146 01100110
00008104:  a  61  097 141 01100001
00008105:  u  75  117 165 01110101
00008106:  l  6C  108 154 01101100
00008107:  t  74  116 164 01110100
00008108:  -  2D  045 055 00101101
00008109:  s  73  115 163 01110011
00008110:  r  72  114 162 01110010
00008111:  c  63  099 143 01100011
00008112:     C2  194 302 11000010
00008113:     A0  160 240 10100000

小开

使用 open(fn, 'rb').read().decode('utf-8')而不仅仅是 open(fn).read()

小开

Python3x 或更高

以字节流加载文件:

     body = ''
for lines in open('website/index.html','rb'):
decodedLine = lines.decode('utf-8')
body = body+decodedLine.strip()
return body

使用全局设置:

    import io
import sys
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf-8')

小开

当在 Ubuntu 18.04上使用 Python 3.6时，我已经解决了这两个问题:

with open(filename, encoding="utf-8") as lines:

如果以命令行方式运行该工具:

export LC_ALL=C.UTF-8

请注意，如果您在 Python 2.7中，您必须以不同的方式处理这个问题。首先，您必须设置默认编码:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

然后加载文件，你必须使用 io.open来设置编码:

import io
with io.open(filename, 'r', encoding='utf-8') as lines:

你还是需要输出环境

export LC_ALL=C.UTF-8

小开

在试图在 Docker 容器中安装 python 包时出现了这个错误。对我来说，问题在于 Docker 映像没有配置 locale。将以下代码添加到 Dockerfile 为我解决了这个问题。

# Avoid ascii errors when reading files in Python
RUN apt-get install -y locales && locale-gen en_US.UTF-8
ENV LANG='en_US.UTF-8' LANGUAGE='en_US:en' LC_ALL='en_US.UTF-8'