A Linked-List / Binary-Tree Data Structure

Published 2008-12-28 – Updated 2012-10-23

This class set is a combined linked-list and binary-tree map implementation. It was originally designed in about 2000 or 2001. Since then it has gone through several refactorings to arrive in the final form, each time with the desire to obtain added function while adhering to the DRY principle.

The primary motivation was to create, for a Java 1.2 JVM, a map object with the following capabilities:

Key to value mapping.
Bi-directional iteration in key order.
Key lookup with <= and >= matching.
The ability to combine the tree node, key and/or value in order to conserve memory (by not requiring the key and value to be separate from the node object).

The secondary motivation was to teach myself how a recursive binary tree works - I find the best way to learn is to do.

Further, even as of Java 6, the only comparable collections are those which implement NavigableMap; and these seem, in my opinion, rather clumsy.

I do need to caveat that this data structure does not perform as well as a TreeMap (and probably not as well as a LinkedHashMap), for the most part due to the recursive nature of the algorithm. The insert/remove operations are the most expensive (relatively) at about 1:6 (for random key values; insertion in key sequence compares substantially better). Get operations are quite close at 1:1.33. That said, it is more functional than TreeMap and leaner on memory (which matters for some purposes), and it’s much better suited for sequential reading from a key position and for inter- key positioning.

In order to support supplying a key when searching for a node, the implementations of LinkedTree specifically perform comparisons using node.compareKey(key). This, then, makes the assumption that a node is coded to support specific keys, and not that keys are coded to support nodes. Because of this design a simple String, Number and other standard existing classes are usable as keys, which would not be the case if Comparable and key.compareTo(node) were used.

As currently designed, there are 4 classes in this set; they could easily be folded into a single class having one of the three implementations if size were a significant issue.

2008-01-27: After processing my code using FindBugs I realized that my LinkedTree node violated the contract of equals() and misused Comparator.compareTo() to compare key values. The implementation here has been corrected, changing the Node by:

dropping Comparable,
changing compareTo() to compareKey() and
dropping equals() and hashCode().

2010-10-23: Make classes generic.

Base Class - LinkedTree

The LinkedTree class defines the public API, and use of these objects should define the target variable as LinkedTree as opposed to one of the subclasses. The exception is InsertionLinkedTree which defines an additional getAndPutFirstNode() API - and even then, unless you are using that API, you should define a LinkedTree.

LinkedTree          index1=new WeightedLinkedTree();  // recommended declaration
WeightedLinkedTree  index2=new WeightedLinkedTree();  // not recommended
InsertionLinkedTree cache =new InsertionLinkedTree(); // only for getAndPutFirstNode()

The methods in this class fall into two broad categories. First the public methods most of whose implementations are supplied by this class. The second set are package methods including abstract methods which must be implemented by the subclasses. These could all be made protected to permit external subclassing for alternative implementations - I deliberately chose not to, to constrain the possible implementations in order to retain the ability to transparently substitute a different implementation in future.

Tree balancing is done by “promoting” a child node into the place of its parent. This can be thought of as a rotation, but that is not strictly accurate. In fact, the promotion manipulates the tree as follows:

Promotion of Left Child:          Promotion of Right Child:
    [4]               [2]           [2]                   [4]
    / \               / \           / \                   / \
  [2] [5]  becomes  [1] [4]       [1] [4]    becomes    [2] [5]
  / \                   / \           / \               / \
[1] [3]               [3] [5]       [3] [5]           [1] [3]

The trick in conceptualizing this as a rotation is what happens, in our examples, to the [3] node in each case. It remains in place and the tree rotates around it. I personally find this easier to conceptualize as a promotion of [2] and [4], respectively, than as a rotation around [3].

The API design deliberately allows the nodes to be walked as either a list or tree, as well as directly accessed by key.

RandomLinkedTree

This implementation balances the tree based on a randomly generated “balancer” for each node. Statistically it will, in practice, always result in an acceptably well balanced tree. But there is always the remote possibility for the pathological case with a tree which is effectively a linked list. One side-effect of this approach is that the precise structure of the tree is determined randomly, so repeatedly loading a tree with the same data set results in a different structure every time unless the random source is identically seeded.

In practice I find that this implementation typically gives a maximum tree depth of about 2.1 to 2.7 times optimal, but it seems to get slightly worse as n increases.

WeightedLinkedTree

Originally an experiment and learning exercise for me, this implementation achieves a near-optimal tree depth by recording at each node its total “weight” (the number of nodes it holds, including itself) and then, as nodes are added and removed the tree rebalances as the recursive function unfolds by testing whether the node would be better balanced as is, or by promoting one or other child node. The idea for this and the implementation were not influenced by any outside source, though subsequent research indicates that the idea may not have been original back in 2006.

In addition a special balancing case exists which is used to minimize “zigzag” trees, since they are harder to balance well. When we encounter a null left child on the left child of its parent node, or a null right child on the right child of its parent node, such as:

Left Child           Right Child
  [4]                [2]
  /                    \
[2]         -OR-       [4]
  \                    /
  [3]                [3]

they are rebalanced as:

    [4]              [2]
    /                  \
  [3]       -OR-       [3]
  /                      \
[2]                      [4]

since that then permits the top node to rebalance (in both cases) as:

  [3]
  / \
[2] [4]

In practice I find that this implementation typically gives a maximum tree depth of about 1.25 to 1.3 times optimal, and it seems stable even for large n.

InsertionLinkedTree

The insertion linked tree is a special case which maintains its linked list in insertion order instead of key order. It could as easily subclass either RandomLinkedTree or WeightedLinkedTree. I chose the latter, but as coded, the extends clause could simply be changed, if desired.

Because its list order is not key order, this implementation of the linked tree does not permit searching with for keys using <= or >= comparison.

This class can be easily used to create an MRU (most recently used) cache:

/**
 * Get a value from the cache. If the key is found, the node is automatically moved to the end of the cache (that is
 * "freshened" in the cache).
 */
public Object get(Comparable key) {
    LinkedTree.Node                         nod;                                                // node in cache

    if((nod=cache.findNode(key))!=null) {
        cache.putNode(nod,true);                                                                // move node to end of the linked list ("freshen").
        return nod.getValue();
        }
    else {
        return null;
        }
    }

...

/**
 * Put a value into the cache.
 * If the value is null, the key is used as the value.
 * The new value replaces any existing value.
 */
public void put(Comparable key, Object val) {
    if(val==null) { val=key; }
    cache.putNode(new LinkedTree.ComparableMapNode(key,val),true);
    if(cache.size()>limit) { cache.removeNode(cache.getFirstNode().getKey()); }
    }

An optimized find could be added to InsertionLinkedTree to relink the node at the end of the list when it is returned.

The functionality of this class could have been folded into the LinkedList, but it was never clear to me if that was better design. Certainly, getAndPutFirstNode() is odd as a method of LinkedTree, but fits as a method of InsertionLinkedTree.

Notes

Optimal tree depth is defined as being the smallest value of x such that 2^x > n where n is the number of nodes in the tree.

Get The Source

The source compiles to Java 5.

Download LinkedTree.zip.

Download MruCache.java

Code Craft

Software is equal parts Art, Craft, and Engineering