This diagram, retrieved from Wikipedia, seems to depict a trie with (at least) the keys 'A', 'to', 'tea', 'ted', 'ten', 'i', 'in' and 'inn' inserted:
If this trie were to store items for the keys 't' or 'te' there would need to be extra information (the numbers in the diagram) present at each node to distinguish between nullary nodes and nodes with actual values.
Each subscript accesses an internal node. That means to retrieve smile_item, you must access seven nodes. Eight node accesses correspond to smiled_item and smiles_item, and nine to smiling_item. For these four items, there are fourteen nodes in total. They all have the first four bytes (corresponding to the first four nodes) in common, however. By condensing those four bytes to create a root that corresponds to ['s']['m']['i']['l'], four node accesses have been optimised away. That means less memory and less node accesses, which is a very good indication. The optimisation can be applied recursively to reduce the need to access unnecessary suffix bytes. Eventually, you get to a point where you're only comparing 在尝试索引的位置,搜索键和索引键之间的差异. This is a radix trie.
TRIE:
We can have a search scheme where instead of comparing a whole search key with all existing keys (such as a hash scheme), we could also compare each character of the search key. Following this idea, we can build a structure (as shown below) which has three existing keys – “爸爸”, “Dab”, and ”出租车”.
[root]
...// | \\...
| \
c d
| \
[*] [*]
...//|\. ./|\\... Fig-I
a a
/ /
[*] [*]
...//|\.. ../|\\...
/ / \
B b d
/ / \
[] [] []
(cab) (dab) (dad)
帕特里夏树序曲/三:
It would be interesting to notice that even strings as keys can be represented using binary-alphabets. If we assume ASCII encoding, then a key “dad” can be written in binary form by writing the binary representation of each character in sequence, say as “011001000110000101100100” by writing binary forms of ‘d’, ‘a’, and ‘d’ sequentially.
使用这个概念,可以形成一个 < strong > trie (带有基数2)。下面我们使用一个简化的假设来描述这个概念,即字母‘ a’、‘ b’、‘ c’和‘ d’来自一个较小的字母表,而不是 ASCII。
图 III 注释:
如前所述,为了简化描述,让我们假设一个只有4个字母{ a,b,c,d }的字母表及其对应的二进制表示分别是“00”、“01”、“10”和“11”。这样,我们的字符串键“ dad”、“ dab”和“ cabi”分别变成“110011”、“110001”和“100001”。这方面的尝试如图三所示(位从左到右读取,就像字符串从左到右读取一样)。
在 try 中,大多数节点不存储密钥,只是
键和扩展键之间的路径。这些跳跃大多数是
但是当我们存储长单词时,它们往往会产生长单词
内部节点链,每个节点只有一个子节点
reason tries need too much space, sometimes more than BSTs.
基数试验(又名基数树,又名 Patricia 树)是基于
我们可以通过某种方式压缩路径,例如
“中间 t 节点”,我们可以在一个节点中使用“ hem”,或者在
一个节点。
下面是一个比较 trie 与基数 trie 的图表:
The original trie has 9 nodes and 8 edges, and if we assume 9 bytes
对于每个节点开销为4字节的边,这意味着
9 * 4 + 8 * 9 = 108 bytes.
右边的压缩版本有6个节点和5条边,但是在这里
case each edge carries a string, not just a character; however, we can
通过计算边引用和字符串来简化操作
标签分开。这样,我们仍然计算每边9字节
(因为我们将在边缘中包含字符串终止符字节
) ,但是我们可以将字符串长度之和作为第三项添加到
the final expression; the total number of bytes needed is given by