Thursday, Jan 17th

Last update12:59:40 PM GMT

Ternary Search tree

Write e-mail


A Ternary Search Tree

The ternary search tree(TST) aka trie is a 3-way tree where everynode's left subtree has keys less than the node's key, every middle subtree has keys equal to the node's key, and every right subtree has keys greater than the node's key. In computer science, a ternary search tree is a ternary (three-way) tree data structure of strings which combines the speed of a prefix search tree, or trie, with the space efficiency of a binary search tree. It finds all keys having a given prefix, suffix, or infix. It even finds those keys that closely match a given pattern. You can easily search the tree for partial matches. In addition, you can implement near-match functions, which gives you the ability to suggest alternatives for misspelled words.

A trie (which is pronounced as ‘tree’ since it’s from the word retrieval, but practically everyone says ‘try’) is a data structure that stores not whole strings, but the characters in strings. But, a trie has a major problem. It requires lots of nodes. If we needed to support all 26 English letters, each node would have to store 26 pointers. And, if we need to support international characters, punctuation, or distinguish between lowercase and uppercase characters, the memory usage grows becomes untenable.

A TST stores key-value pairs, where keys are strings and values are objects. TST keys are stored and retrieved in sorted order, regardless of the order in which they are inserted into the tree. In addition, TSTs use memory efficiently to store large quantities of data. Best of all, the ternary search tree is lightning fast. The tremendous flexibility of TSTs provides ample opportunity for programming creatively.

Seeding the tree

First, it’s important to note that the ternary tree structure does not allow us to put the same string into the tree more than once. Duplicates are not allowed.

We start off by using the same method as for the search. We follow the links in the tree until we reach a null link. Once we get to this null link, we add as many nodes as we need in order to express the remaining characters in the string. These nodes are all joined up by the ‘equal’ links, and each node has a character from the string. So, if we were inserting the word ‘TRIE’ into an empty ternary search tree (that is, the root node is nil), we’d create a root node with the letter ‘T’, we’d link that through its equal link to a node with the letter ‘R’, and link that through its equal link to a node with the letter ‘I’ and so on. Finally we’d have a long thin ternary search tree with five nodes for T, R, I, E and null. Inserting the word ‘TREE’ into this tree, would cause a split off to the left at the ‘I’ node for the first ‘E’ in TREE, and we’d add three more nodes after that.

Partial searches

A fascinating operation we can do on a ternary search tree is partial match searching. This is what we do to solve crossword puzzles. Let’s say that there’s a five-letter word at 12 across in which the second letter is O and the fourth letter is V, and we need to find out all the words that will fit. This operation is simple to perform with a ternary search tree, mainly because all the nodes are single characters.

Let’s suppose the pattern format we use consists of literal characters for the letters we know and a ‘don’t know’ character (say the full stop) for those we don’t. So, the pattern for our example crossword clue would be ‘.O.V.’.

The algorithm is recursive. We peel off the characters one by one from the pattern and perform the normal search operation, with the exception of when the current character is a full stop. For this character, we follow all of the links: left, equal, then right. When we reach the end of a word in the tree, we output it and continue where we left off (because it’s entirely possible we’ll find several hits for a given pattern, giving us a choice of words).

Retrieving a sorted list

Like a binary search tree, the strings stored in a ternary tree can be retrieved in sorted order. This is a little hard to see at first, since the strings themselves aren’t in the tree, but we can easily devise a recursive routine that will follow all the links in a ‘pre-order traversal’ (left, equal, then right) and output each string when we get to the null markers that indicate the end of a string. The pre-order traversal means that the output strings will be in sorted order.

Similar strings

In many applications, the names within the name-value pairs are very similar, in the sense that they start with the same characters and only differ towards the end. For example, following set of identifiers: ‘RedBlackNode’, ‘RedBlackTree’ and ‘RedBlackTreePainter’. The ternary tree performs just as well in cases such as this (after all, it’s merely a case of picking off characters and following links), but other algorithms would spend a lot of time comparing the equal front segment of the strings over and over again.

Share this post

Add comment

Please refrain from using slang or abusive language in the comments.
To avoid waiting for your comment to be published after being reviewed, please log in as a registered user.

Security code

Web Hosting