Difference between Tries and Trees?

I remotely remember that tries don't store the whole data per node, only the suffix to the parent node.

Where trees do store the whole data, but only organize themselves based on a prefix.

So tries get smaller, which allows for example to compress dictionaries very well.

Is that really the only difference?

From actual applications I remember that tries are faster in range queries. There are even special solr/lucene trie fields to speed up range queries. But how is that so?

What is the actual difference and what are the advantages and disadvantages of tries and trees?

49604 次浏览

A tree is a general structure of recursive nodes. There are many types of trees. Popular ones are binary tree and balanced tree. A Trie is a kind of tree, known by many names including prefix tree, digital search tree, and retrieval tree (hence the name 'trie').

Each kind of tree has a different purpose, structure and behaviour. For example, a binary tree stores a collection of comparable items (eg numbers). It can therefore be used to store a set of numbers, or to index other data that can be represented by numbers (eg objects that can be hashed). Its structure is sorted so it can be searched quickly to find a single item. Other tree structures, such as a balanced tree, are similar in principle.

A trie represents a sequence in its structure. It is very different in that it stores sequences of values rather than individual single values. Each level of recursion says 'what is the value of item I of the input list'. This is different to a binary tree which compares the single searched value to each node.

A binary tree or a bst is typically used to store numerical values. The time complexity in a bst is O(log(n)) for insertion, deletion and searching. Each node in a binary tree has at most 2 child nodes.

Trie : Every node of trie consists of multiple branches. Each branch represents a possible character of keys. We need to mark the last node of every key as leaf node. A trie node field value will be used to distinguish the node as leaf node (there are other uses of the value field)

To learn about tries refer this topcoder tutorial. https://www.topcoder.com/community/data-science/data-science-tutorials/using-tries/

Just got some insights from this talk, even through the Radix tree used in linux kernel is slight different to the one on wikipedia.

Trees only store keys, they don't store the value associated with the 钥匙。

In tree structure, we store whole words. There is a lot of overhead. If you search for antic, you would traverse each word and compare all the strings in antic. This takes up too much time and memory:

enter image description here

However, in tries, compression is the key. We store only the common prefixes. this will reduce the reduncancy.

enter image description here

Pros of tries:

  • The search time only depends on the length of the searched string.

  • Search misses only involve examining a few characters (in particular, just the longest common prefix between the search string and the corpus stored in the tree).

  • There are no collisions of unique keys in a trie.

  • There is no need to provide a hash function or to change hash functions as more keys are added to a trie.

  • A trie can provide an alphabetical ordering of the entries by key.

unfortunately, there is no perfect data structure. So even tries do have some downsides:

  • Tries can be slower than hash tables at looking up data whenever a container is too big to fit in memory. Hash tables would need fewer disk accesses, even down to a single access, while a trie would require O(m) disk reads for a string of length m.

  • Hash tables are usually allocated in a single big and contiguous chunk of memory, while trie nodes can span the whole heap. So, the former would better exploit the principle of locality.

  • A trie’s ideal use case is storing text strings. We could, in theory, stringify any value, from numbers to objects, and store it. Yet, if we were to store floating-point numbers, for instance, there are some edge cases that can produce long meaningless paths, such as periodic or transcendent numbers, or results of certain floating points operations such as 0.1+0.2, due to issues with double-precision representation.

  • Tries have memory overhead for nodes and references.

Use tries when you have to frequently perform prefix searches ( longestPrefix or keysWithPrefix ). Use hash tables when data is stored on slow supports like disk or whenever memory locality is important.

enter image description here

Reference