Python 3.2 UnicodeEncodeError: ‘ charmap’codec 不能编码位置为9629的字符“ u2013”: 字符映射到 < unDefinition >

我正在尝试编写一个从 sqlite3数据库中获取数据的脚本,但是我遇到了一个问题。

数据库中的字段是 text 类型的,其中包含一个 html 格式的文本

<html>
<head>
<title>Yahoo!</title>
</head>
<body>
<style type="text/css">
html {}
.yshortcuts {border-bottom:none !important;}
.ReadMsgBody {width:100%;}
.ExternalClass{width:100%;}
</style>
<table cellpadding="0" cellspacing="0" bgcolor="#ffffff">
<tr>
<td width="550" valign="top" align="left">


<table cellpadding="0" cellspacing="0" width="500">
<tr>
<td colspan="3"><img        src="http://mail.yimg.com/nq/assets/sharedmessages/v1/us/logo.gif" width="292" height="51" style="display:block;" border="0" alt="Yahoo! Mail"></td>
</tr>
<tr>
<td rowspan="3" width="1" bgcolor="#c7c4ca"></td>
<td width="498" height="1" bgcolor="#c7c4ca"></td>
<td rowspan="3" width="1" bgcolor="#c7c4ca"></td>
</tr>
<tr>
<td width="498" valign="top" align="left">
<table cellpadding="0" cellspacing="0">
<tr>
<td width="498" bgcolor="#61399d" align="left" valign="top">
<table cellspacing="0" cellpadding="0"><tr><td height="24"></td></tr></table>
<div style="font-family:Arial, Helvetica, sans-serif;font-size:23px;line-height:27px;margin-bottom:10px;color:#ffffff;margin-left:15px;"><span style="color:#ffffff;text-decoration:none;font-weight:bold;line-height:27px;">Välkommen till Yahoo! Mail.</span></div>
<div style="font-family:Arial, Helvetica, sans-serif;font-size:22px;line-height:26px;margin-bottom:1px;color:#ffffff;margin-left:15px;margin-bottom:7px;margin-right:15px;">Ansluta och dela går snabbt och enkelt och är tillgängligt överallt.</div>
</td>
</tr>
<tr>
<td><img src="http://mail.yimg.com/nq/assets/sharedmessages/v1/all/b1.gif" width="498" height="18" style="display:block;" border="0"></td>
</tr>
</table>
<table cellpadding="0" cellspacing="0" width="498">
<tr>
<td width="292" valign="top">
<table cellpadding="0" cellspacing="0">
<tr>
<td><img src="http://mail.yimg.com/nq/assets/sharedmessages/v1/all/grad.gif" width="292" height="9" style="display:block;"></td>
</tr>
<tr>
<td width="292" bgcolor="#ffffff" align="left" valign="top">
<table cellspacing="0" cellpadding="0"><tr><td height="11"></td></tr></table>
<div style="margin-left:15px;">
<div style="font-family:Arial, Helvetica, sans-serif;font-size:14px;line-height:18px;color:#333333;margin-bottom:11px;font-weight:bold;">Det är lätt som en plätt att komma igång.</div>
<table cellpadding="0" cellspacing="0" width="267">
<tr>
<td width="16" align="left" valign="top"><div style="font-family:Arial, Helvetica, sans-serif;font-size:14px;line-height:16px;color:#61399d;margin-bottom:9px;font-weight:bold;">1. </div></td>
<td align="left" valign="top"><div style="font-family:Arial, Helvetica, sans-serif;font-size:13px;line-height:16px;color:#61399d;margin-bottom:9px;"><a rel="nofollow" target="_blank" href="http://us-mg999.mail.yahoo.com/neo/launch?action=contacts" style="text-decoration:underline;color:#61399d;"><span>Lägg till alla dina kontakter på en plats</span></a>.</div></td>
</tr>
<tr>
<td align="left" valign="top"><div style="font-family:Arial, Helvetica, sans-serif;font-size:14px;line-height:16px;color:#61399d;margin-bottom:9px;font-weight:bold;">2. </div></td>
<td align="left" valign="top"><div style="font-family:Arial, Helvetica, sans-serif;font-size:13px;line-height:16px;color:#61399d;margin-bottom:9px;"><a rel="nofollow" target="_blank" href="http://mrd.mail.yahoo.com/themes" style="text-decoration:underline;color:#61399d;"><span>Anpassa din nya inkorg</span></a>.</div></td>
</tr>
<tr>
<td align="left" valign="top"><div style="font-family:Arial, Helvetica, sans-serif;font-size:14px;line-height:16px;color:#61399d;margin-bottom:9px;font-weight:bold;">3. </div></td>
<td align="left" valign="top"><div style="font-family:Arial, Helvetica, sans-serif;font-size:13px;line-height:16px;color:#61399d;"><a rel="nofollow" target="_blank" href="http://se.overview.mail.yahoo.com/mobile" style="text-decoration:underline;color:#61399d;"><span>Anslut överallt på dina mobila enheter</span></a>.</div></td>
</tr>
</table>


</div>
</td>
</tr>
<tr><td height="13"></td></tr>
</table>
</td>
<td width="196" valign="top">
<table cellpadding="0" cellspacing="0">
<tr>
<td width="1" bgcolor="#fbfbfd" valign="top"><img src="http://mail.yimg.com/nq/assets/sharedmessages/v1/all/g1.gif" width="1" height="21" style="display:block;"></td>
<td width="1" bgcolor="#f5f6fa" valign="top"><img src="http://mail.yimg.com/nq/assets/sharedmessages/v1/all/g2.gif" width="1" height="21" style="display:block;"></td>
<td width="1" bgcolor="#e8eaf1" valign="top"><img src="http://mail.yimg.com/nq/assets/sharedmessages/v1/all/g3.gif" width="1" height="21" style="display:block;"></td>
<td width="1" bgcolor="#d4d4d4"></td>
<td width="186" bgcolor="#f0f0f0" align="left" valign="top">
<table cellspacing="0" cellpadding="0"><tr><td height="3">   </td></tr></table>
<div style="margin-left:11px;">
<div style="font-family:Arial, Helvetica, sans-serif;font-size:13px;line-height:16px;color:#333333;margin-bottom:9px;"><b>Info för dig:</b></div>
<div style="font-family:Arial, Helvetica, sans-serif;font-size:12px;color:#43494e;line-height:18px;margin-bottom:10px;">
Yahoo!-ID och e-postadress:<br />
<div style="font-family:Arial, Helvetica, sans-serif;font-size:12px;color:#43494e;line-height:18px;">
Håll ditt konto och inställningar aktuella. <br><a rel="nofollow" target="_blank" href="https://edit.yahoo.com/config/eval_profile" style="text-decoration:underline;color:#61399d;"><span>Mitt konto</span></a>
</div>
</div>
<table cellspacing="0" cellpadding="0"><tr><td height="20"></td></tr></table>
</td>
<td width="1" bgcolor="#dbdbdb"></td>
<td width="1" bgcolor="#ced2de"></td>
<td width="1" bgcolor="#dbdfed"></td>
<td width="1" bgcolor="#e8ebf3"></td>
<td width="1" bgcolor="#f3f4f9"></td>
<td width="1" bgcolor="#fafbfc"></td>
</tr>
<tr>
<td colspan="11"><img src="http://mail.yimg.com/nq/assets/sharedmessages/v1/all/b2.gif" width="196" height="8" style="display:block;" border="0"></td>
</tr>
<tr><td height="13"></td></tr>
</table>
</td>
<td width="10"></td>
</tr>
</table>
</td>
</tr>
<tr>
<td width="498" height="1" bgcolor="#c7c4ca"></td>
</tr>
</table>
<table cellpadding="0" cellspacing="0" width="500">
<tr>
<td align="center" valign="top">
<table cellspacing="0" cellpadding="0"><tr><td height="10"></td></tr></table>
<div style="font-family:Arial, Helvetica, sans-serif;font-size:11px;line-height:18px;margin-bottom:10px;">
<a rel="nofollow" target="_blank" href="http://info.yahoo.com/legal/se/yahoo/utos.html" style="text-decoration:underline;color:#61399d;">Yahoo! Villkor för användning</a>&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;<a rel="nofollow" target="_blank" href="http://info.yahoo.com/legal/se/yahoo/mail/atos.html" style="text-decoration:underline;color:#61399d;">Yahoo! Mail –Villkor för användning</a>&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;<a rel="nofollow" target="_blank" href="http://info.yahoo.com/privacy/se/yahoo/details.html" style="text-decoration:underline;color:#61399d;">Yahoo! Sekretesspolicy</a>
</div>
</td>
</tr>
<tr>
<td align="left" valign="top">
<div style="font-family:Arial, Helvetica, sans-serif;font-size:11px;line-height:14px;color:#545454;margin-left:16px;margin-right:14px;">Var god svara inte på detta meddelande. Detta är ett servicemeddelande som rör din användning av Yahoo! Mail. Om du vill veta mer om Yahoo!s användning av personlig information, inklusive användning av webb-beacons i HTML-baserad e-post, kan du läsa vår Yahoo! Sekretesspolicy. Yahoo!s adress är 701 First Avenue, Sunnyvale, CA 94089, USA.<br /><br />RefID: lp-1037111</div>
</td>
</tr>
</table>










</td>
</tr>
</table>
<img width="1" height="1" src="http://pclick.internal.yahoo.com/p/s=2143684696">
</body>
</html>`

尝试提取数据的 python 代码如下所示。

>>> import sqlite3
>>> conn = sqlite3.connect('C:/temp/Mobils/export/com.yahoo.mobile.client.android.mail/databases/mail.db')
>>> c = conn.cursor()
>>> conn.row_factory=sqlite3.Row
>>> c.execute('select body from messages_1 where _id=7')
<sqlite3.Cursor object at 0x0000000001FB78F0>
>>> r = c.fetchone()
>>> r.keys()
['body']
>>> print(r['body'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python32\lib\encodings\cp850.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 9629: character maps to <undefined>
>>>

有人知道怎么把这个打印/写到文件里吗。是的,我知道这是打印到标准输出,但我得到相同的 UnicodeEncodeError 时,我尝试写入一个文件。我尝试了文件对象和 print(r['body'], file=f)的写方法。

186190 次浏览

When you open the file you want to write to, open it with a specific encoding that can handle all the characters.

with open('filename', 'w', encoding='utf-8') as f:
print(r['body'], file=f)

While Python 3 deals in Unicode, the Windows console or POSIX tty that you're running inside does not. So, whenever you print, or otherwise send Unicode strings to stdout, and it's attached to a console/tty, Python has to encode it.

The error message indirectly tells you what character set Python was trying to use:

  File "C:\Python32\lib\encodings\cp850.py", line 19, in encode

This means the charset is cp850.

You can test or yourself that this charset doesn't have the appropriate character just by doing '\u2013'.encode('cp850'). Or you can look up cp850 online (e.g., at Wikipedia).

It's possible that Python is guessing wrong, and your console is really set for, say UTF-8. (In that case, just manually set sys.stdout.encoding='utf-8'.) It's also possible that you intended your console to be set for UTF-8 but did something wrong. (In that case, you probably want to follow up at superuser.com.)

But if nothing is wrong, you just can't print that character. You will have to manually encode it with one of the non-strict error-handlers. For example:

>>> '\u2013'.encode('cp850')
UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 0: character maps to <undefined>
>>> '\u2013'.encode('cp850', errors='replace')
b'?'

So, how do you print a string that won't print on your console?

You can replace every print function with something like this:

>>> print(r['body'].encode('cp850', errors='replace').decode('cp850'))
?

… but that's going to get pretty tedious pretty fast.

The simple thing to do is to just set the error handler on sys.stdout:

>>> sys.stdout.errors = 'replace'
>>> print(r['body'])
?

For printing to a file, things are pretty much the same, except that you don't have to set f.errors after the fact, you can set it at construction time. Instead of this:

with open('path', 'w', encoding='cp850') as f:

Do this:

with open('path', 'w', encoding='cp850', errors='replace') as f:

… Or, of course, if you can use UTF-8 files, just do that, as Mark Ransom's answer shows:

with open('path', 'w', encoding='utf-8') as f:

Maybe a little late to reply. I happen to run into the same problem today. I find that on Windows you can change the console encoder to utf-8 or other encoder that can represent your data. Then you can print it to sys.stdout.

First, run following code in the console:

chcp 65001
set PYTHONIOENCODING=utf-8

Then, start python do anything you want.

for me using export PYTHONIOENCODING=UTF-8 before executing python command worked .

On Windows (git bash):

chcp.com 65001
export PYTHONIOENCODING=utf-8

And then you can run your script, in my case it was flask so finally it could decode utf on my website from mysql db...