CSC301 Apr22

slide version

single file version

Contents

  1. Application
  2. Algorithms for ceiling and floor Methods
  3. Invariants for the Key floor(Key k) Algorithm
  4. Computing Key floor(Key k)
  5. Java Code outline for Key floor(Key k)
  6. Key floor(key) - final version
  7. A Not Very Useful printKeys Method for BST
  8. Implementing the Iterable<Key> keys() Method
  9. The Iterable<Key> keys() method in Java
  10. Final Java Version of Iterable<Key> keys()
  11. Implementing Binary Search Tree Methods Recursively (Hw3)
  12. Hw3 Additional Hints/Requirements
  13. The sizeAtHeight method
  14. Balanced Binary Search Trees
  15. Why balanced?
  16. What is the balance condition?
  17. How does balancing affect execution time?
  18. AVL Trees
  19. Is an AVL Tree Balanced?
  20. Height of an AVL Tree with N Keys
  21. Insertion into an AVL Tree
  22. Tree Rotations
  23. The Rotate Methods
  24. Fix Unbalanced Node with rotateRight
  25. Example Requiring Two Rotations
  26. Same Two Rotations, Bigger Tree
  27. rotateLeft
  28. The rotateLeft Method
  29. Double Rotations
  30. The double rotation
  31. AVL Demo

Application[1] [top]

Application: The program reads initially reads Dow Jones Industrial Average closing averages from a file containing dates and closing averages. The program then prompts for an input date.

Data Structure to store the file data?

Solution using this data structure

Algorithms for ceiling and floor Methods[2] [top]

As for many of the binary search tree methods, these two methods can easily be implemented either iteratively (i.e. with loops) or recursively (one public and one private, recursive method).

Either type of implementation is facilitated by a closer examination of examples.

Suppose we wish to find

 floor(45)  // largest key in the tree less than or equal to 45 (or null)
    

Example 1

      60
     /  \
    50  70

floor(45) = ?
    

Candidates? Which keys (if any) are smaller than 45? Which one of these candidates (if any) is largest?

Example 2

             40
            /  \
           /    \
          20     60     
         /  \   /  \ 
        10  30 50  70 

    
floor(45) = ?
    

Candidates? Which keys (if any) are smaller than 45? Which one of these candidates is largest?

Invariants for the Key floor(Key k) Algorithm[3] [top]

For a binary search tree define invariants t and candiate

 t: Node reference to subtree that contains floor or null if no there
    is no such subtree.
 candidate: Node not in t whose key is the largest key less than the key in t
            or null if there is no such Node in the tree.
    

How should we intialize t and candidate so that the both descriptions are correct initially?

Assuming both t and candidate are initialized correctly, as t and candidate are updated, we make sure the invariant relation is maintained.

In that case how do we compute floor(k)?

Computing Key floor(Key k)[4] [top]

Invariants

 t: Node reference to subtree that contains floor of k or else contains no keys
 smaller than k.


 candidate: Node not in t whose key is the largest key less than k in
            the whole tree or null if there is no such Node in the tree.
    

Since the subtree referenced by t gets smaller with each step, this will eventually find the floor.

Java Code outline for Key floor(Key k)[5] [top]


public class BST<Key extends Comparable<Key>, Value>
{

  public Key floor(Key k)
  {
   
    return floor(k, null, root);
  }

  /**
   * returns the floor of k
   * @param k the search key
   * @param candidate largest key in the tree but not in subtree t 
   * that is less than k (might be null)
   * @param t subtree that contains the floor of k or that is empty if
   * the floor of k is candidate
   *
   * @return the floor of k in the subtree t or candidate if 
   * k is larger than all keys in subtree t.
   */
  private Key floor(Key k, Key candidate, Node t)
  {

    /*
     * (Base case) If t is null, floor is candidate
     * if k is equal to t.key, floor is just k.
     * if k is smaller than t.key, then candidate doesn't change,
     *	  but t should be updated to be the left subtree, t.left.
     * if k is larger than t.key, then candidate should be updated
     *    to be t.key and t should be updated to be the right subtree, t.right.
     */
  }

}

Key floor(key) - final version[6] [top]


public class BST<Key extends Comparable<Key>, Value>
{

  public Key floor(Key k)
  {
   
    return floor(k, null, root);
  }

  /**
   * returns the floor of k
   * @param k the search key
   * @param candidate largest key in the tree but not in subtree t 
   * that is less than k (might be null)
   * @param t subtree that contains the floor of k or that is empty if
   * the floor of k is candidate
   *
   * @return the floor of k in the subtree t or candidate if 
   * k is larger than all keys in subtree t.
   */
  private Key floor(Key k, Key candidate, Node t)
  {

    if (t == null) return candidate;
    int cmp = k.compareTo(t.key);
    if (cmp < 0) {
      return floor(k, candidate, t.left);
    } else if (cmp > 0) {
      return floor(k, t.key, t.right);
    } else {
      return t.key;
    }
  }

}

A Not Very Useful printKeys Method for BST[7] [top]

We could add a public printKeys method to BST:


public class BST<Key extends Comparable<Key>, Value>
{
  /**
   * prints the keys to standard output in sorted order
   */
  public void printKeys()
  {
    printKeys(root);
  }

  /**
   * prints the keys in the subtree t to standard output
   * in sorted order
   */
  private void printKeys(Node t)
  {
    /**
     * if t is empty, return
     * print keys in the left subtree in sorted order
     * print t.key
     * print keys in the right subtree in sorted order
     */

  }

}

Implementing the Iterable<Key> keys() Method[8] [top]

The keys method has to return a reference to some object that has a method:

 Iterator<Key> iterator()      
    

The Iterator<Key> that this iterator() method returns should allow the keys to be returned, one at a time and in sorted order, using the next() method.

The Queue<E> class implements Iterable<E> [Queue has an Iterator<E> iterator() method].

So one implementation of keys() would be to

The Iterable<Key> keys() method in Java[9] [top]


public class BST<Keys extends Comparable<Key>, Value>
{

  Iterable<Key> keys()
  {
    Queue<Key> keylst = new Queue<Key>();
    addKeys(root, lst);
    return lst;
  }

  /**
   * adds the keys in subtree t in order to the end of lst
   * @param t the subtree
   * @param lst list of keys smaller than the keys in t.
   */

  private void addKeys(Node t, Queue<Key> lst)
  {
    /**
     * (Like printKeys, but add to lst instead of printing:
     *
     * if t is empty  do nothing (just return)
     * add keys from the left subtree
     * add t.key
     * add keys from the right subtree
     */
  }

}

Final Java Version of Iterable<Key> keys()[10] [top]


public class BST<Keys extends Comparable<Key>, Value>
{

  Iterable<Key> keys()
  {
    Queue<Key> keylst = new Queue<Key>();
    addKeys(root, keylst);
    return keylst;
  }

  /**
   * adds the keys in subtree t in order to the end of lst
   * @param t the subtree
   * @param lst list of keys smaller than the keys in t.
   */

  private void addKeys(Node t, Queue<Key> lst)
  {
    if (t == null) return lst;
    addKeys(t.left, lst);
    lst.add(t.key);
    addKeys(t.right, lst);
  }

}

Implementing Binary Search Tree Methods Recursively (Hw3)[11] [top]

You are to complete and test the methods in the BSTSet class.

A JUnit test class, BSTSetTest is provided.

Hints, etc.

Hw3 Additional Hints/Requirements[12] [top]

Some of the size methods ask you to count the number of nodes satisfying some property. Some of the remove methods ask you to remove the subtree at nodes satisfying some property. These properties include:

In the case of the height and depth, the public method has an int parameter. E.g.

      public int sizeAtDepth(int k)
    

This should return the number of nodes whose depth is k.

What parameters should the recursive private sizeAtDepth method have? It always needs 1 more, e.g. Node.

The depth of the root is 0 and the depth of a child is 1 more than its parent. So the depth of nodes can be determined as you go down the tree.

Recalling the principles

It is easy to pass an additional parameter to the private method that gives the depth of the subtree at the Node parameter:

      /**
       * 
       * @param t the subtree
       * @param k the required depth
       * @param d the depth of node t
       * @return the number of nodes in the subtree at t that 
       * have depth k
      private int sizeAtDepth(Node t, int k, int d)
    

What does the public method pass to the private method for these three parameters?

The sizeAtHeight method[13] [top]

The sizeAtHeight method seems similar to the sizeAtDepth method, but the problem is we can't calculate the height of each node as we go down the tree, only on the way back up. Why?

We could call the recursive private height(Node t) method at every Node, but this would be very inefficient and add to the recursive calls of the private sizeAtHeight.

For the size and remove methods whose condition involves the height, an efficient implementation can be achieved by using an extra member of the Node to store its height. This avoids the recursive calls to height(Node t).

Balanced Binary Search Trees[14] [top]

Why balanced?

What is the balance condition?

How does balancing affect execution time?

How complicated is the implementation for balancing?

Why balanced?[15] [top]

Easy! We want the put and get methods to be O(log(N)) instead of O(N).

The height of a binary search tree is the longest path from the root to a null link.

The height of a binary search tree with N keys determines the number of comparisons in the worst case for both put and get.

If the tree is unbalanced some paths will be short at the expense of making other paths longer with the worst unbalanced case where there is only 1 path of length N - 1.

What is the balance condition?[16] [top]

There are a number of possible choices for the balance condition for a balanced binary search tree:

How does balancing affect execution time?[17] [top]

For the AVL tree or the Red Black tree, the implementation of the get methods is exactly the same as for an ordinary (unbalanced) binary search tree.

The put and delete methods that modify the tree need some additional code to handle the balancing.

For example, for the put method, going down the tree (searching) and creating a new Node when the null link is found is also exactly like the ordinary unbalanced binary search tree code.

Returning back up the tree and attaching the modified subtree is where code needs to be inserted to check if the addition to the subtree requires rebalancing.

This rebalancing code may take place at each node on the way back up the tree and so the effect on execution time depends on this code being fast, e.g., O(1).

AVL Trees[18] [top]

AVL trees are Binary Search Trees with the additional property that the methods maintain a "balanced" property:

      For every node in an AVL tree, the heights of its children differ by
      at most 1.
    

For the purpose of this definition (and also the implementation), an empty child will be considered to be at height -1.

The AVL tree below shows the height of each node.

      The children of 600 have heights 

       0 (left child 550) and
      -1 (right child empty).

      and these differ by only 1.
    

Is an AVL Tree Balanced?[19] [top]

For a perfectly balanced binary search tree with N keys we saw

 h = O(log(N))      
    

But what about the height of an AVL tree with N keys?

Height of an AVL Tree with N Keys[20] [top]

What is an upper bound for the height of an AVL tree with N keys?

This can be determined if we can answer the reverse question:

What is a lower bound for the number of keys in an AVL tree of height h?

Let N be the minimum number of keys that can be in an AVL tree of height h.

Fact
      2h / 2 < N
    

But this means

     h < 2log(N)  
    

Great! The height of an AVL tree is O(log(N)).

Since the height = O(log(N), this means the get method is guaranteed to be O(log(N)) for an AVL tree, even though it isn't perfectly balanced.

The put method first does a search and then either updates a value or else changes a few links to attach a new key value pair. Since the height is O(log(N)), this much would also cost only O(log(N)) for the put method.

However, the put method must also maintain the balance property. So we have make sure that doesn't add any more than O(log(N)) additional steps.

Insertion into an AVL Tree[21] [top]

Maintaining the AVL Tree property, requires that the insert includes some code to rebalance portions of the tree if the property would otherwise be violated.

For example if we try to insert the value 525 into this tree in the usual way as a child of 550, the height of 550 would become 1, while the right child (empty) of 600 is still at height -1, a difference of 2.

So the node with 600 would become unbalanced!

Tree Rotations[22] [top]

This problem is solved by:

      1. First insert in the usual way for a binary search tree either in
      the left subtree or right subtree. (Duplicates are not allowed in
      this version.)
      2. Check if the heights of the two children subtrees differs by 2.
      If so, then rotate nodes to restore the AVL properties.
    

Two rotation methods are needed: rotateLeft or rotateRight.

To rebalance an unbalanced node requires we will need to either do one rotation or two rotations, depending on how the node became unbalanced.

The Rotate Methods[23] [top]

The rotate methods are:


      1. private Node rotateLeft( Node p )

      2. private Node rotateRight( Node p )

    

Fix Unbalanced Node with rotateRight[24] [top]

Returning to the example AVL tree, we try to insert 525:

The rotateRight method below is called to rotate the node p to the right with its left child, 550 becoming the parent of 525 and 600:

Example Requiring Two Rotations[25] [top]

If we try to insert 425 into this tree, node 500 would then violate the AVL condition. However, we can't use the rotateRight method:

First rotate 400 to the left with its new right child (425), and then rotate 500 with its new left child (425 again in a new position) to the right to get:

Same Two Rotations, Bigger Tree[26] [top]

A slightly more typical example of two rotations is shown below. The value 450 is being inserted:

The rotateLeft will rotate 400 to the left, making 425 the new root of the subtree, then rotateRight will rotate 500 to the right making 425 the new root of the tree.

What happened to 425's previous left and right children?

rotateLeft[27] [top]

rotateLeft

The rotateLeft Method[28] [top]

Here is the method:

  private static Node rotateLeft( Node p )
  {
    // p's right child becomes the new subtree root
    Node r = ?;
    p.right = ?;
    r.left = ?;

    // Adjust the heights if 'height' is stored in the nodes.


    return r;
  }

Note: The height method is a static method that just returns the height of
the node passed to it, but also handles the case if null is
passed. In the later case, height returns -1.
    

Double Rotations[29] [top]

Two rotations are required:

 rotateRight Q
 rotateLeft P
    

double rotation
      required

The double rotation[30] [top]


 /*
      * item inserted to right of t and left of t.right.
      * node t is unbalanced and requires two rotations
      *
      * 1. rotate t.right to the right and attach the new subtree root to t.right
      * 2. rotate t to the left and return the new subtree root
      */

 


    

AVL Demo[31] [top]

(I'll provide an executable jar file with this demo.)