为什么没有包含所有 Unicode 字形的字体?

差不多和标题上说的一样。正确地呈现所有 Unicode 格式,使用复合字符和影响其他字符和连字符的字符是非常困难的,我理解这一点。我们的字体似乎是为了最大限度地支持 Unicode 符号而设计的(符号,代码2001,其他) ,以及特定平面或字符范围的专用字体(BabelStone Han,其他)。

我不太了解字体的基本技术细节。有最大尺寸吗?是版权问题吗?从根本上重新绘制所有约110,000现存的象形文字太难了吗?我理解风格问题,但是为什么不回到使用 一切字形的“默认”字体呢?它们在 unicode.org 上,重新绘制它们将是相当困难的工作,但那样的话,你就可以保证所有东西都有一个备用字体。如果你有一些预先存在的字体的权利,你可以只是合成他们,这应该有很大的帮助。这样的字体对人类来说是一个巨大的帮助,我找不到一个好的技术理由来解释它为什么不存在,或者至少是一个开源的努力来创建它,所以我假设一个对我来说看不见的理由来解释为什么不能这样做。

那是什么原因?

37825 次浏览

"Why would you even want that?" questions aside, from a programming perspective there's a very simple reason: the OpenType spec only affords an addressable glyph index space of one USHORT, so one font can only support 16 bits worth of glyphs identifiers, or 65,536 glyphs max. (And note the terminology: a "glyph" is not the same as a "character" or "letter")

The current version of Unicode, v8 as of this answer, contains 120,737 assigned code points, or almost twice as many as fit in a modern font (2021 edit: v13 upped this number to 143,859). In fact, Unicode hasn't been able to fit in a modern OpenType font since 2001, with the release of Unicode 3.1, which upped the number of code points from 49,259 to 94,205.

"So what about font collections?" I hear you ask. Why not use multiple fonts and support all unicode that way? Well now, you've just described Adobe's Sans Pro, and Google's Noto (which are the same font).

As for the "how hard can it be": a uniform style for all glyphs in Unicode, across 129 established written scripts on this planet, each with their own typesetting rules? Incredibly hard. You may think fonts are just files with pictures for letters, and someone types a letter, that picture shows up: that is not how fonts work, and isn't how fonts have worked since the late 1980's.

Modern fonts are the typographic equivalent of a game ROM: sure, it's not much use without the hardware or software to run that ROM on, but all the things that actually matter are in the ROM. Similarly, modern fonts contain all the information for typesetting. Not just pictures, they contain the metadata, the metrics, the positioning and substitutions rules for arbitrary sequences, with separate rule sets for each written script that OpenType supports, mandatory and optional ligatures, language-specific character replacements for letters at the start/middle/final position in a word, or in isolation, character repositioning relative to arbitarily complex sequences of other characters either before or after it, arbitrarily complex sequence replacements with other arbitrarily complex sequences, possible bitmap fallbacks for small-point rendering, hinting instructions on how to properly rasterize vector graphics that are inherently not aligned to any particular pixel grid, and more. A modern font is a ridiculously complex application, that a font engine consults to figure out how to typeset sequences of code points.

Making a (set of) Unicode-encompassing font(s) that looks good for all contexts is a vast team effort.

So: "Why isn't there a font that contains all Unicode glyphs?", because that's been technically impossible since 2001. We can, and do, make font families that cover all of Unicode, but with 129 different scripts all with their own typesetting rules, it's a lot of work, and almost (almost) not worth the effort compared to only covering a subset of all languages.

And as for this:

Such a font would be a great help to humanity and I can't see a good technical reason why it doesn't exist or at least an open-source effort to create it, so I presume an invisible-to-me reason why it can't be done.

Just because you didn't know about them, doesn't mean they don't exist, with millions of people who are familiar with them. They exist =)

They're even open source, go out and thank the people who made them!

There is GNU Unifont. It aims to contain all Unicode, except Apple Emoji.

You will probably find what you are looking for at the following links.

Unicode Character Table

HTML Character Entity References

Huge List of Unicode Symbols

List of Unicode Characters of Category “Other Symbol

This other is funny for particular character since you can draw what you search:

Unicode Character Recognition

Can't enter unicode character with Alt+ even with EnableHexNumpad

Basic Questions

Q: How many characters are in Unicode? A: The short answer is that as of Version 13.0, the Unicode Standard contains 143,859 characters. The long answer is rather more complicated, because of all the different kinds of characters that people might be interested in counting.

Unicode font A Unicode font is a computer font that maps glyphs to code points defined in the Unicode Standard. The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single writing system, or even only support the basic Latin alphabet.

Fonts which support a wide range of Unicode scripts and Unicode symbols are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a TrueType font is restricted to 65,535, it is not possible for a single font to provide individual glyphs for all defined Unicode characters (143,859 characters, with Unicode 13.0).

...

No single "Unicode font" includes all the characters defined in the present revision of ISO 10646 (Unicode) standard, as more and more languages and characters are continually added to it, and common font formats cannot contain more than 65,535 glyphs (about half the number of characters encoded in Unicode).

As a result, font developers and foundries incorporate new characters in newer versions or revisions of a font, or in separate auxiliary fonts intended specifically for particular languages.

Enjoy!