Python 3-Encode/Decode vs Bytes/Str

我是 python3的新手,来自 python2,我对 unicode 的基本原理有点困惑。 我读过一些很好的文章,它们让一切变得更加清晰,但是我看到在 python3上有两种方法,它们处理编码和解码,我不确定该使用哪一种。

所以 python3的想法是,每个字符串都是 unicode,可以用字节进行编码和存储,或者再次解码成 unicode 字符串。

但有两种方法:
u'something'.encode('utf-8')将生成 b'something',但 bytes(u'something', 'utf-8')也是如此。
b'bytes'.decode('utf-8')似乎和 str(b'bytes', 'utf-8')做同样的事情。

现在我的问题是,为什么有两种方法似乎做同样的事情,并且要么比其他方法更好(为什么?)我一直试图在谷歌上找到答案,但是不走运。

>>> original = '27岁少妇生孩子后变老'
>>> type(original)
<class 'str'>
>>> encoded = original.encode('utf-8')
>>> print(encoded)
b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81'
>>> type(encoded)
<class 'bytes'>
>>> encoded2 = bytes(original, 'utf-8')
>>> print(encoded2)
b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81'
>>> type(encoded2)
<class 'bytes'>
>>> print(encoded+encoded2)
b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x8127\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81'
>>> decoded = encoded.decode('utf-8')
>>> print(decoded)
27岁少妇生孩子后变老
>>> decoded2 = str(encoded2, 'utf-8')
>>> print(decoded2)
27岁少妇生孩子后变老
>>> type(decoded)
<class 'str'>
>>> type(decoded2)
<class 'str'>
>>> print(str(b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81', 'utf-8'))
27岁少妇生孩子后变老
>>> print(b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81'.decode('utf-8'))
27岁少妇生孩子后变老
242426 次浏览

Neither is better than the other, they do exactly the same thing. However, using .encode() and .decode() is the more common way to do it. It is also compatible with Python 2.

To add to Lennart Regebro's answer There is even the third way that can be used:

encoded3 = str.encode(original, 'utf-8')
print(encoded3)

Anyway, it is actually exactly the same as the first approach. It may also look that the second way is a syntactic sugar for the third approach.


A programming language is a means to express abstract ideas formally, to be executed by the machine. A programming language is considered good if it contains constructs that one needs. Python is a hybrid language -- i.e. more natural and more versatile than pure OO or pure procedural languages. Sometimes functions are more appropriate than the object methods, sometimes the reverse is true. It depends on mental picture of the solved problem.

Anyway, the feature mentioned in the question is probably a by-product of the language implementation/design. In my opinion, this is a nice example that show the alternative thinking about technically the same thing.

In other words, calling an object method means thinking in terms "let the object gives me the wanted result". Calling a function as the alternative means "let the outer code processes the passed argument and extracts the wanted value".

The first approach emphasizes the ability of the object to do the task on its own, the second approach emphasizes the ability of an separate algoritm to extract the data. Sometimes, the separate code may be that much special that it is not wise to add it as a general method to the class of the object.

To add to add to the previous answer, there is even a fourth way that can be used

import codecs
encoded4 = codecs.encode(original, 'utf-8')
print(encoded4)