Application: The program reads initially reads Dow Jones Industrial Average closing averages from a file containing dates and closing averages. The program then prompts for an input date.

If the stock exchange was open on that date, the closing average is printed
If date is later than the last date read in from the file, a message is printed and the last recorded date and closing average are printed.
Otherwise, a message is printed that the stock exchange was closed for the input date, then prints the next recorded date and its closing average.

Data Structure to store the file data?

Solution using this data structure

As for many of the binary search tree methods, these two methods can easily be implemented either iteratively (i.e. with loops) or recursively (one public and one private, recursive method).

Either type of implementation is facilitated by a closer examination of examples.

Suppose we wish to find

 floor(45)  // largest key in the tree less than or equal to 45 (or null)

Example 1

      60
     /  \
    50  70

floor(45) = ?

Candidates? Which keys (if any) are smaller than 45? Which one of these candidates (if any) is largest?

Example 2

             40
            /  \
           /    \
          20     60     
         /  \   /  \ 
        10  30 50  70 

    
floor(45) = ?

Candidates? Which keys (if any) are smaller than 45? Which one of these candidates is largest?

For a binary search tree define invariants t and candiate

 t: Node reference to subtree that contains floor or null if no there
    is no such subtree.
 candidate: Node not in t whose key is the largest key less than the key in t
            or null if there is no such Node in the tree.

How should we intialize t and candidate so that the both descriptions are correct initially?

Assuming both t and candidate are initialized correctly, as t and candidate are updated, we make sure the invariant relation is maintained.

In that case how do we compute floor(k)?

Invariants

 t: Node reference to subtree that contains floor of k or else contains no keys
 smaller than k.


 candidate: Node not in t whose key is the largest key less than k in
            the whole tree or null if there is no such Node in the tree.

(Base case) If t is null, floor is candidate
If k is equal to t.key, floor is just k.
If k is smaller than t.key, then candidate doesn't change, but t should be updated to be the left subtree, t.left.
If k is larger than t.key, then candidate should be updated to be t.key and t should be updated to be the right subtree, t.right.

Since the subtree referenced by t gets smaller with each step, this will eventually find the floor.


public class BST<Key extends Comparable<Key>, Value>
{

  public Key floor(Key k)
  {
   
    return floor(k, null, root);
  }

  /**
   * returns the floor of k
   * @param k the search key
   * @param candidate largest key in the tree but not in subtree t 
   * that is less than k (might be null)
   * @param t subtree that contains the floor of k or that is empty if
   * the floor of k is candidate
   *
   * @return the floor of k in the subtree t or candidate if 
   * k is larger than all keys in subtree t.
   */
  private Key floor(Key k, Key candidate, Node t)
  {

    /*
     * (Base case) If t is null, floor is candidate
     * if k is equal to t.key, floor is just k.
     * if k is smaller than t.key, then candidate doesn't change,
     *	  but t should be updated to be the left subtree, t.left.
     * if k is larger than t.key, then candidate should be updated
     *    to be t.key and t should be updated to be the right subtree, t.right.
     */
  }

}


public class BST<Key extends Comparable<Key>, Value>
{

  public Key floor(Key k)
  {
   
    return floor(k, null, root);
  }

  /**
   * returns the floor of k
   * @param k the search key
   * @param candidate largest key in the tree but not in subtree t 
   * that is less than k (might be null)
   * @param t subtree that contains the floor of k or that is empty if
   * the floor of k is candidate
   *
   * @return the floor of k in the subtree t or candidate if 
   * k is larger than all keys in subtree t.
   */
  private Key floor(Key k, Key candidate, Node t)
  {

    if (t == null) return candidate;
    int cmp = k.compareTo(t.key);
    if (cmp < 0) {
      return floor(k, candidate, t.left);
    } else if (cmp > 0) {
      return floor(k, t.key, t.right);
    } else {
      return t.key;
    }
  }

}

We could add a public printKeys method to BST:


public class BST<Key extends Comparable<Key>, Value>
{
  /**
   * prints the keys to standard output in sorted order
   */
  public void printKeys()
  {
    printKeys(root);
  }

  /**
   * prints the keys in the subtree t to standard output
   * in sorted order
   */
  private void printKeys(Node t)
  {
    /**
     * if t is empty, return
     * print keys in the left subtree in sorted order
     * print t.key
     * print keys in the right subtree in sorted order
     */

  }

}

The keys method has to return a reference to some object that has a method:

 Iterator<Key> iterator()

The Iterator<Key> that this iterator() method returns should allow the keys to be returned, one at a time and in sorted order, using the next() method.

The Queue<E> class implements Iterable<E> [Queue has an Iterator<E> iterator() method].

So one implementation of keys() would be to

Create an empty Queue<Key>
Insert the keys from BST into this Queue in order.
return the Queue


public class BST<Keys extends Comparable<Key>, Value>
{

  Iterable<Key> keys()
  {
    Queue<Key> keylst = new Queue<Key>();
    addKeys(root, lst);
    return lst;
  }

  /**
   * adds the keys in subtree t in order to the end of lst
   * @param t the subtree
   * @param lst list of keys smaller than the keys in t.
   */

  private void addKeys(Node t, Queue<Key> lst)
  {
    /**
     * (Like printKeys, but add to lst instead of printing:
     *
     * if t is empty  do nothing (just return)
     * add keys from the left subtree
     * add t.key
     * add keys from the right subtree
     */
  }

}


public class BST<Keys extends Comparable<Key>, Value>
{

  Iterable<Key> keys()
  {
    Queue<Key> keylst = new Queue<Key>();
    addKeys(root, keylst);
    return keylst;
  }

  /**
   * adds the keys in subtree t in order to the end of lst
   * @param t the subtree
   * @param lst list of keys smaller than the keys in t.
   */

  private void addKeys(Node t, Queue<Key> lst)
  {
    if (t == null) return lst;
    addKeys(t.left, lst);
    lst.add(t.key);
    addKeys(t.right, lst);
  }

}

You are to complete and test the methods in the BSTSet class.

A JUnit test class, BSTSetTest is provided.

Hints, etc.

You must use recursion. The public method should call a private method that will need at least 1 more parameter than the public method - namely, a Node parameter. The public method will pass root to this parameter.
The private methods that change the tree (remove methods) should return a Node reference that points to the modified tree (or subtree) that was passed to it.
The private methods that do NOT change the tree (size methods) typically just return an int.
Some public methods return boolean type (isXXX methods). The corresponding private methods may need to return an integer to use to test the condition (depending on what the XXX condition is) as well as whether the condition is true or not.

For example, the public method:
```
	  public boolean isBalancedS()
	
```
should return true if at every Node, the size of its left subtree is the same as the size of its right subtree.

The private isBalancedS method could in principle return boolean. But at each subtree you need to know 2 things: (1) are the left and right subtrees themselves size balanced and (2) do the left and right subtrees have the same size.

If the private isBalancedS method returned boolean, we would only have the answer for (1). To check (2), we could call the private size method to get the sizes of the left and right subtrees. But this means another recursive descent down into those subtrees!

Solution:

The private isBalancedS should return an int. This value should be the size of the subtree provided the subtree itself is size balanced. But if the subtree is not size balanced, it should return a different value. E.g. -1 will do since the size of a subtree is always >= 0.

The public isBalancedS() method then just checks whether the private method returns -1 or not.

This is much more efficient than calling size(t.left) and size(t.right) at every Node t to check the balance condition in addition to calling isBalancedS(t.left) and isBalanced(t.right) to check that the subtrees are themselves balanced.

Some of the size methods ask you to count the number of nodes satisfying some property. Some of the remove methods ask you to remove the subtree at nodes satisfying some property. These properties include:

whether the node contains an odd integer key
size of the left and right subtrees
height of the node
depth of the node

In the case of the height and depth, the public method has an int parameter. E.g.

      public int sizeAtDepth(int k)

This should return the number of nodes whose depth is k.

What parameters should the recursive private sizeAtDepth method have? It always needs 1 more, e.g. Node.

The depth of the root is 0 and the depth of a child is 1 more than its parent. So the depth of nodes can be determined as you go down the tree.

Recalling the principles

code before the recursive calls correspond to actions as you go down the tree.
code after the recursive calls return correspond to actions as you go back up the tree.

It is easy to pass an additional parameter to the private method that gives the depth of the subtree at the Node parameter:

      /**
       * 
       * @param t the subtree
       * @param k the required depth
       * @param d the depth of node t
       * @return the number of nodes in the subtree at t that 
       * have depth k
      private int sizeAtDepth(Node t, int k, int d)

What does the public method pass to the private method for these three parameters?

The sizeAtHeight method seems similar to the sizeAtDepth method, but the problem is we can't calculate the height of each node as we go down the tree, only on the way back up. Why?

We could call the recursive private height(Node t) method at every Node, but this would be very inefficient and add to the recursive calls of the private sizeAtHeight.

For the size and remove methods whose condition involves the height, an efficient implementation can be achieved by using an extra member of the Node to store its height. This avoids the recursive calls to height(Node t).

Why balanced?

What is the balance condition?

How does balancing affect execution time?

How complicated is the implementation for balancing?

Easy! We want the put and get methods to be O(log(N)) instead of O(N).

The height of a binary search tree is the longest path from the root to a null link.

The height of a binary search tree with N keys determines the number of comparisons in the worst case for both put and get.

If the tree is unbalanced some paths will be short at the expense of making other paths longer with the worst unbalanced case where there is only 1 path of length N - 1.

There are a number of possible choices for the balance condition for a balanced binary search tree:

For each node, the number of nodes in its left subtree should be (almost) the same as the number in its right subtree. (Not really used. Probably messy to implement.)
(AVL tree) For each node, the heights of its children should differ by at most 1.
(Red Black tree) Add a "color" property to each node: either black or red.

Then the balance condition requires that all paths from the root to a null link have the same number of black nodes. Paths can contain extra red nodes, but with a restriction so that the length of every path is no greater than 2 times the number of black nodes in the path.

For the AVL tree or the Red Black tree, the implementation of the get methods is exactly the same as for an ordinary (unbalanced) binary search tree.

The put and delete methods that modify the tree need some additional code to handle the balancing.

For example, for the put method, going down the tree (searching) and creating a new Node when the null link is found is also exactly like the ordinary unbalanced binary search tree code.

Returning back up the tree and attaching the modified subtree is where code needs to be inserted to check if the addition to the subtree requires rebalancing.

This rebalancing code may take place at each node on the way back up the tree and so the effect on execution time depends on this code being fast, e.g., O(1).

AVL trees are Binary Search Trees with the additional property that the methods maintain a "balanced" property:

      For every node in an AVL tree, the heights of its children differ by
      at most 1.

For the purpose of this definition (and also the implementation), an empty child will be considered to be at height -1.

The AVL tree below shows the height of each node.

      The children of 600 have heights 

       0 (left child 550) and
      -1 (right child empty).

      and these differ by only 1.

For a perfectly balanced binary search tree with N keys we saw

 h = O(log(N))

But what about the height of an AVL tree with N keys?

What is an upper bound for the height of an AVL tree with N keys?

This can be determined if we can answer the reverse question:

What is a lower bound for the number of keys in an AVL tree of height h?

Let N be the minimum number of keys that can be in an AVL tree of height h.

Fact

      2^{h / 2} < N

But this means

     h < 2log(N)

Great! The height of an AVL tree is O(log(N)).

Since the height = O(log(N), this means the get method is guaranteed to be O(log(N)) for an AVL tree, even though it isn't perfectly balanced.

The put method first does a search and then either updates a value or else changes a few links to attach a new key value pair. Since the height is O(log(N)), this much would also cost only O(log(N)) for the put method.

However, the put method must also maintain the balance property. So we have make sure that doesn't add any more than O(log(N)) additional steps.

Maintaining the AVL Tree property, requires that the insert includes some code to rebalance portions of the tree if the property would otherwise be violated.

For example if we try to insert the value 525 into this tree in the usual way as a child of 550, the height of 550 would become 1, while the right child (empty) of 600 is still at height -1, a difference of 2.

So the node with 600 would become unbalanced!

This problem is solved by:

      1. First insert in the usual way for a binary search tree either in
      the left subtree or right subtree. (Duplicates are not allowed in
      this version.)
      2. Check if the heights of the two children subtrees differs by 2.
      If so, then rotate nodes to restore the AVL properties.

Two rotation methods are needed: rotateLeft or rotateRight.

To rebalance an unbalanced node requires we will need to either do one rotation or two rotations, depending on how the node became unbalanced.

The rotate methods are:


      1. private Node rotateLeft( Node p )

      2. private Node rotateRight( Node p )

Returning to the example AVL tree, we try to insert 525:

The rotateRight method below is called to rotate the node p to the right with its left child, 550 becoming the parent of 525 and 600:

If we try to insert 425 into this tree, node 500 would then violate the AVL condition. However, we can't use the rotateRight method:

First rotate 400 to the left with its new right child (425), and then rotate 500 with its new left child (425 again in a new position) to the right to get:

A slightly more typical example of two rotations is shown below. The value 450 is being inserted:

The rotateLeft will rotate 400 to the left, making 425 the new root of the subtree, then rotateRight will rotate 500 to the right making 425 the new root of the tree.

What happened to 425's previous left and right children?

rotateLeft

Here is the method:

  private static Node rotateLeft( Node p )
  {
    // p's right child becomes the new subtree root
    Node r = ?;
    p.right = ?;
    r.left = ?;

    // Adjust the heights if 'height' is stored in the nodes.


    return r;
  }

Note: The height method is a static method that just returns the height of
the node passed to it, but also handles the case if null is
passed. In the later case, height returns -1.

Two rotations are required:

 rotateRight Q
 rotateLeft P

double rotation
required


 /*
      * item inserted to right of t and left of t.right.
      * node t is unbalanced and requires two rotations
      *
      * 1. rotate t.right to the right and attach the new subtree root to t.right
      * 2. rotate t to the left and return the new subtree root
      */

(I'll provide an executable jar file with this demo.)

CSC301 Apr22

slide version

single file version

Contents

Application [1] [top]

Algorithms for ceiling and floor Methods [2] [top]

Invariants for the Key floor(Key k) Algorithm [3] [top]

Computing Key floor(Key k)[4] [top]

Java Code outline for Key floor(Key k)[5] [top]

Key floor(key) - final version [6] [top]

A Not Very Useful printKeys Method for BST [7] [top]

Implementing the Iterable<Key> keys() Method [8] [top]

The Iterable<Key> keys() method in Java [9] [top]

Final Java Version of Iterable<Key> keys()[10] [top]

Implementing Binary Search Tree Methods Recursively (Hw3)[11] [top]

Hints, etc.

Hw3 Additional Hints/Requirements [12] [top]

The sizeAtHeight method [13] [top]

Balanced Binary Search Trees [14] [top]

Why balanced?[15] [top]

What is the balance condition?[16] [top]

How does balancing affect execution time?[17] [top]

AVL Trees [18] [top]

Is an AVL Tree Balanced?[19] [top]

Height of an AVL Tree with N Keys [20] [top]

Fact

Insertion into an AVL Tree [21] [top]

Tree Rotations [22] [top]

The Rotate Methods [23] [top]

Fix Unbalanced Node with rotateRight [24] [top]

Example Requiring Two Rotations [25] [top]

Same Two Rotations, Bigger Tree [26] [top]

rotateLeft [27] [top]

The rotateLeft Method [28] [top]

Double Rotations [29] [top]

The double rotation [30] [top]

AVL Demo [31] [top]

CSC301 Apr22

slide version

single file version

Contents

Application[1] [top]

Algorithms for ceiling and floor Methods[2] [top]

Invariants for the Key floor(Key k) Algorithm[3] [top]

Computing Key floor(Key k)[4] [top]

Java Code outline for Key floor(Key k)[5] [top]

Key floor(key) - final version[6] [top]

A Not Very Useful printKeys Method for BST[7] [top]

Implementing the Iterable<Key> keys() Method[8] [top]

The Iterable<Key> keys() method in Java[9] [top]

Final Java Version of Iterable<Key> keys()[10] [top]

Implementing Binary Search Tree Methods Recursively (Hw3)[11] [top]

Hints, etc.

Hw3 Additional Hints/Requirements[12] [top]

The sizeAtHeight method[13] [top]

Balanced Binary Search Trees[14] [top]

Why balanced?[15] [top]

What is the balance condition?[16] [top]

How does balancing affect execution time?[17] [top]

AVL Trees[18] [top]

Is an AVL Tree Balanced?[19] [top]

Height of an AVL Tree with N Keys[20] [top]

Fact

Insertion into an AVL Tree[21] [top]

Tree Rotations[22] [top]

The Rotate Methods[23] [top]

Fix Unbalanced Node with rotateRight[24] [top]

Example Requiring Two Rotations[25] [top]

Same Two Rotations, Bigger Tree[26] [top]

rotateLeft[27] [top]

The rotateLeft Method[28] [top]

Double Rotations[29] [top]

The double rotation[30] [top]

AVL Demo[31] [top]

Application [1] [top]

Algorithms for ceiling and floor Methods [2] [top]

Invariants for the Key floor(Key k) Algorithm [3] [top]

Key floor(key) - final version [6] [top]

A Not Very Useful printKeys Method for BST [7] [top]

Implementing the Iterable<Key> keys() Method [8] [top]

The Iterable<Key> keys() method in Java [9] [top]

Hw3 Additional Hints/Requirements [12] [top]

The sizeAtHeight method [13] [top]

Balanced Binary Search Trees [14] [top]

AVL Trees [18] [top]

Height of an AVL Tree with N Keys [20] [top]

Insertion into an AVL Tree [21] [top]

Tree Rotations [22] [top]

The Rotate Methods [23] [top]

Fix Unbalanced Node with rotateRight [24] [top]

Example Requiring Two Rotations [25] [top]

Same Two Rotations, Bigger Tree [26] [top]

rotateLeft [27] [top]

The rotateLeft Method [28] [top]

Double Rotations [29] [top]

The double rotation [30] [top]

AVL Demo [31] [top]