Python 是否禁止两个类似的 Unicode 标识符?

我在使用 Unicode 标识符时偶然发现了这个:

>>> 𝑓, x = 1, 2
>>> 𝑓, x
(1, 2)
>>> 𝑓, f = 1, 2
>>> 𝑓, f
(2, 2)

这是怎么回事?为什么 Python 只是偶尔替换 𝑓引用的对象?这种行为在哪里描述的?

6442 次浏览

PEP 3131 -- Supporting Non-ASCII Identifiers says

All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC.

You can use unicodedata to test the conversions:

import unicodedata


unicodedata.normalize('NFKC', '𝑓')
# f

which would indicate that '𝑓' gets converted to 'f' in parsing. Leading to the expected:

𝑓  = "Some String"
print(f)
# "Some String"

Here's a small example, just to show how horrible this "feature" is:

𝕋𝐡ᵢ𝔰_f𝔢𝘢𝚝𝓊ᵣₑ_𝕤ₕ𝔬𝔲𝖑𝔡_dₑ𝕗ᵢ𝘯i𝘵𝚎ℓy_𝒷𝘦_𝐚_𝚋ᵘg = 42
print(T𝗵ℹ𝚜_𝒇e𝖆𝚝𝙪ᵣe_ₛ𝔥º𝓾𝗹𝙙_𝚍e𝒇ᵢ𝒏ⁱtᵉ𝕝𝘆_𝖻ℯ_𝔞_𝖇𝖚𝓰)
# => 42

Try it online! (But please don't use it)

And as mentioned by @MarkMeyer, two identifiers might be distinct even though they look just the same ("CYRILLIC CAPITAL LETTER A" and "LATIN CAPITAL LETTER A")

А = 42
print(A)
# => NameError: name 'A' is not defined