UnicodeEncodeError：'ascii'codec无法在位置20编码字符u'\xa0'：序数不在范围内（128）

小开

最佳答案

不要使用str()将Unicode转换为编码文本/字节。

相反，使用#0对字符串进行编码：

p.agent_info = u' '.join((agent_contact, agent_telno)).encode('utf-8').strip()

或者完全在Unicode中工作。

小开

这是一个经典的python Unicode痛点！考虑以下内容：

a = u'bats\u00E0'print a=> batsà

到目前为止一切都很好，但是如果我们调用str（a），让我们看看会发生什么：

str(a)Traceback (most recent call last):File "<stdin>", line 1, in <module>UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 4: ordinal not in range(128)

哦，dip，这对任何人都没有任何好处！要修复错误，请使用. encode显式编码字节并告诉python使用什么编解码器：

a.encode('utf-8')=> 'bats\xc3\xa0'print a.encode('utf-8')=> batsà

视频\u00E0！

问题是当你调用str（）时，python使用默认字符编码来尝试对你给它的字节进行编码，在你的情况下，这些字节有时是Unicode字符的表示。要解决这个问题，你必须告诉python如何使用. encode（'whatever_unicode'）处理你给它的字符串。大多数时候，你应该使用utf-8没问题。

有关此主题的精彩阐述，请参阅Ned Batchold的PyCon演讲：http://nedbatchelder.com/text/unipain.html

小开

我实际上发现，在我的大多数情况下，只是剥离这些字符要简单得多：

s = mystring.decode('ascii', 'ignore')

小开

导致打印失败的一个微妙问题是您的环境变量设置错误，例如LC_ALL设置为“C”。在Debian中，他们不鼓励设置它：Debian wiki on Locale

$ echo $LANGen_US.utf8$ echo $LC_ALLC$ python -c "print (u'voil\u00e0')"Traceback (most recent call last):File "<string>", line 1, in <module>UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 4: ordinal not in range(128)$ export LC_ALL='en_US.utf8'$ python -c "print (u'voil\u00e0')"voilà$ unset LC_ALL$ python -c "print (u'voil\u00e0')"voilà

小开

我找到了优雅的工作，让我删除符号并继续将字符串保留为字符串，如下所示：

yourstring = yourstring.encode('ascii', 'ignore').decode('ascii')

重要的是要注意，使用忽略选项是危险，因为它会静默地从使用它的代码中删除任何Unicode（和国际化）支持，如下所示（转换Unicode）：

>>> u'City: Malmö'.encode('ascii', 'ignore').decode('ascii')'City: Malm'

小开

对我来说，奏效的是：

BeautifulSoup(html_text,from_encoding="utf-8")

希望这有助于某人。

小开

问题是您尝试打印Unicode字符，但您的终端不支持它。

您可以尝试安装language-pack-en包来解决此问题：

sudo apt-get install language-pack-en

它为所有支持的包（包括Python）提供英文翻译数据更新。如有必要，请安装不同的语言包（取决于您尝试打印的字符）。

在某些Linux发行版上，为了确保默认的英语语言环境设置正确（因此Unicode字符可以由shell/终端处理），它是必需的。有时安装它比手动配置它更容易。

然后在编写代码时，确保在代码中使用正确的编码。

例如：

open(foo, encoding='utf-8')

如果您仍然有问题，请仔细检查您的系统配置，例如：

您的语言环境文件（/etc/default/locale），其中应该包含例如。
```
LANG="en_US.UTF-8"LC_ALL="en_US.UTF-8"
```
或：
```
LC_ALL=C.UTF-8LANG=C.UTF-8
```
Value of LANG/LC_CTYPE in shell.
Check which locale your shell supports by:
```
locale -a | grep "UTF-8"
```

Demonstrating the problem and solution in fresh VM.

Initialize and provision the VM (e.g. using vagrant):
```
vagrant init ubuntu/trusty64; vagrant up; vagrant ssh
```
^{请参阅：可用的Ubuntu框}.

打印Unicode字符（例如™之类的商标符号）：

$ python -c 'print(u"\u2122");'Traceback (most recent call last):File "<string>", line 1, in <module>UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 0: ordinal not in range(128)

Now installing language-pack-en:

$ sudo apt-get -y install language-pack-enThe following extra packages will be installed:language-pack-en-baseGenerating locales...en_GB.UTF-8... /usr/sbin/locale-gen: doneGeneration complete.

Now problem should be solved:
```
$ python -c 'print(u"\u2122");'™
```

Otherwise, try the following command:

$ LC_ALL=C.UTF-8 python -c 'print(u"\u2122");'™

小开

找到简单的辅助函数这里。

def safe_unicode(obj, *args):""" return the unicode representation of obj """try:return unicode(obj, *args)except UnicodeDecodeError:# obj is byte stringascii_text = str(obj).encode('string_escape')return unicode(ascii_text)
def safe_str(obj):""" return the byte string representation of obj """try:return str(obj)except UnicodeEncodeError:# obj is unicodereturn unicode(obj).encode('unicode_escape')

小开

我只是遇到了这个问题，谷歌把我带到了这里，所以为了补充这里的一般解决方案，这是对我有效的：

# 'value' contains the problematic dataunic = u''unic += valuevalue = unic

我在阅读Ned的演讲后有了这个想法。

不过，我并没有声称完全理解为什么这是有效的。所以如果有人能编辑这个答案或发表评论来解释，我将不胜感激。

小开

在脚本的开头添加以下行（或作为第二行）：

# -*- coding: utf-8 -*-

这是python源代码编码的定义。PEP 263中的更多信息。

小开

好吧，我尝试了一切，但它没有帮助，在谷歌搜索后，我想到了以下内容，它有所帮助。Python 2.7正在使用中。

# encoding=utf8import sysreload(sys)sys.setdefaultencoding('utf8')

小开

我只是使用了以下内容：

import unicodedatamessage = unicodedata.normalize("NFKD", message)

看看留档怎么说：

unicodedata.normalize（form， unistr）返回标准形式Unicode字符串unistr。表单的有效值是'NFC'、'NFKC'、“NFD”和“NFKD”。

Unicode标准定义了Unicode的各种规范化形式字符串，基于规范等价的定义和兼容性等价。在Unicode中，可以有多个字符以各种方式表示。例如，字符U+00C7（LATINCAPITAL LETTER C with CEDILLA）也可以表示为序列U+0043（拉丁大写字母C）U+0327（组合CEDILLA）。

对于每个字符，有两种正常形式：正常形式C和范式D.范式D（NFD）也称为规范分解，并将每个字符转换为其分解形式。范式C（NFC）首先应用规范分解，然后再次组合预组合字符。

除了这两种形式，还有另外两种形式基于兼容性等价。在Unicode中，某些字符是支持，通常与其他字符统一。对于例如，U+2160（罗马数字一）实际上与U+0049相同（LATIN CAPITAL LETTER I）。但是，它在Unicode中支持兼容现有字符集（例如gb2312）。

范式KD（NFKD）将应用兼容性分解，即用其等效字符替换所有兼容性字符。该范式KC（NFKC）首先应用相容性分解，然后是规范组合。

即使两个Unicode字符串被规范化并且看起来相同人类读者，如果一个有组合字符而另一个没有，它们可能不相等。

帮我解决。简单易行。

小开

这里是一些其他所谓的“逃避”答案的重复。在某些情况下，简单地扔掉麻烦的字符/字符串是一个很好的解决方案，尽管这里表达了抗议。

def safeStr(obj):try: return str(obj)except UnicodeEncodeError:return obj.encode('ascii', 'ignore').decode('ascii')except: return ""

测试它：

if __name__ == '__main__':print safeStr( 1 )print safeStr( "test" )print u'98\xb0'print safeStr( u'98\xb0' )

结果：

1test98°98

更新：我最初的答案是为Python 2编写的。对于python3：

def safeStr(obj):try: return str(obj).encode('ascii', 'ignore').decode('ascii')except: return ""

注意：如果您希望在“不安全”Unicode字符所在的位置保留?指示符，请在对错误处理程序进行编码的调用中指定replace而不是ignore。

建议：您可能想将此函数命名为toAscii？这是一个偏好问题……

最后，这是一个使用six的更健壮的PY2/3版本，我选择使用replace，并在一些字符交换中加入了一些字符交换，以替换向左或向右卷曲的花哨的Unicode引号和撇号，这些引号和撇号是ascii集的一部分。

from six import PY2, iteritems
CHAR_SWAP = { u'\u201c': u'"', u'\u201D': u'"', u'\u2018': u"'", u'\u2019': u"'"}
def toAscii( text ) :try:for k,v in iteritems( CHAR_SWAP ):text = text.replace(k,v)except: passtry: return str( text ) if PY2 else bytes( text, 'replace' ).decode('ascii')except UnicodeEncodeError:return text.encode('ascii', 'replace').decode('ascii')except: return ""
if __name__ == '__main__':print( toAscii( u'testin\u2019' ) )

小开

下面的解决方案为我工作，刚刚添加

u"String"

（将字符串表示为Unicode）在我的字符串之前。

result_html = result.to_html(col_space=1, index=False, justify={'right'})
text = u"""<html><body><p>Hello all, <br><br>Here's weekly summary report.  Let me know if you have any questions. <br><br>Data Summary <br><br><br>{0}</p><p>Thanks,</p><p>Data Team</p></body></html>""".format(result_html)

小开

只需添加到变量encode（'utf-8'）

agent_contact.encode('utf-8')

小开

我总是把下面的代码放在python文件的前两行：

# -*- coding: utf-8 -*-from __future__ import unicode_literals

小开

如果你有类似于packet_data = "This is data"的东西，那么在初始化packet_data之后的下一行执行此操作：

unic = u''packet_data = unic

小开

我们在Django中使用本地化夹具运行manage.py migrate时遇到了这个错误。

我们的源代码包含# -*- coding: utf-8 -*-声明，MySQL正确配置为utf8，Ubuntu在/etc/default/locale中有适当的语言包和值。

问题很简单，Django容器（我们使用docker）缺少LANG env var。

将LANG设置为en_US.UTF-8并在重新运行迁移之前重新启动容器解决了问题。

小开

请打开终端并触发以下命令：

export LC_ALL="en_US.UTF-8"

小开

Python 3.0及更高版本的更新。在python编辑器中尝试以下操作：

locale-gen en_US.UTF-8export LANG=en_US.UTF-8 LANGUAGE=en_US.enLC_ALL=en_US.UTF-8

这将系统的默认语言环境编码设置为UTF-8格式。

小开

这里的许多答案（例如@agf和@AndbDrew）已经解决了OP问题的最直接方面。

然而，我认为有一个微妙但重要的方面在很大程度上被忽视了，对于像我这样的人来说，这一点非常重要，因为他们试图理解Python中的编码：Python 2与Python 3对字符表示的管理截然不同。我觉得有很大一部分困惑与人们在没有版本意识的情况下阅读Python中的编码有关。

我建议任何有兴趣了解OP问题的根本原因的人首先阅读Spolsky的关于字符表示和Unicode的介绍，然后在Python 2和Python 3中转到关于Unicode的巴奇尔德。

小开

我在尝试将Unicode字符输出到stdout时遇到了这个问题，但使用sys.stdout.write而不是print（这样我也可以支持输出到不同的文件）。

美丽汤的留档，我用编解码器库解决了这个问题：

import sysimport codecs
def main(fIn, fOut):soup = BeautifulSoup(fIn)# Do processing, with data including non-ASCII charactersfOut.write(unicode(soup))
if __name__ == '__main__':with (sys.stdin) as fIn: # Don't think we need codecs.getreader herewith codecs.getwriter('utf-8')(sys.stdout) as fOut:main(fIn, fOut)

小开

在shell中：

通过以下命令查找支持的UTF-8语言环境：
```
locale -a | grep "UTF-8"
```
Export it, before running the script, e.g.:
```
export LC_ALL=$(locale -a | grep UTF-8)
```
或手动喜欢：
```
export LC_ALL=C.UTF-8
```
Test it by printing special character, e.g. ™:
```
python -c 'print(u"\u2122");'
```

Above tested in Ubuntu.

小开

唉，这至少在Python 3中有效…

python3

有时错误是在环境变量和结果

import osimport localeos.environ["PYTHONIOENCODING"] = "utf-8"myLocale=locale.setlocale(category=locale.LC_ALL, locale="en_GB.UTF-8")...print(myText.encode('utf-8', errors='ignore'))

其中错误在编码中被忽略。

小开

尽量避免将变量转换为str（变量）。有时，这可能会导致问题。

简单提示避免：

try:data=str(data)except:data = data #Don't convert to String

上面的示例也将解决编码错误。

小开

这个问题经常发生在使用Apache部署django项目时。因为Apache在 /etc/sysconfig/httpd.中设置了环境变量LANG=C只需打开文件并注释（或更改为您的flavior）此设置。或者使用WSGIDaemonProcess命令的lang选项，在这种情况下，您将能够将不同的LANG环境变量设置为不同的虚拟主机。

小开

推荐的解决方案对我不起作用，我可以忍受倾倒所有非ascii字符，所以

s = s.encode('ascii',errors='ignore')

这给我留下了一些不会出错的东西。

小开

这将工作：

 >>>print(unicodedata.normalize('NFD', re.sub("[\(\[].*?[\)\]]", "", "bats\xc3\xa0")).encode('ascii', 'ignore'))

输出：

>>>bats

小开

在一般情况下，将这个不支持的编码字符串（假设data_that_causes_this_error）写入某个文件（例如results.txt），这是有效的

f = open("results.txt", "w")f.write(data_that_causes_this_error.encode('utf-8'))f.close()

小开

延迟回答，但此错误与您的终端编码不支持某些字符有关。
我在python3上修复了它：

import sysimport io
sys.stdout = io.open(sys.stdout.fileno(), 'w', encoding='utf8')print("é, à, ...")

小开

您可以在运行脚本之前将字符编码设置为UTF-8：

export LC_CTYPE="en_US.UTF-8"

这通常会解决这个问题。

小开

它为我工作：

export LC_CTYPE="en_US.UTF-8"

小开

如果它是打印语句的问题，很多时候它只是终端打印的问题。这帮助了我：export PYTHONIOENCODING=UTF-8

小开

你可以使用Unicodedata来避免UnicodeEncodeError。这是一个例子：

import unicodedata
agent_telno = agent.find('div', 'agent_contact_number')agent_telno = unicodedata.normalize("NFKD", agent_telno) #it will remove all unwanted character like '\xa0'agent_telno = '' if agent_telno is None else agent_telno.contents[0]p.agent_info = str(agent_contact + ' ' + agent_telno).strip()