CSC301 Apr08

slide version

single file version

Contents

  1. Application in Java
  2. Timing
  3. Best Practice for Keys
  4. A Symbol Table Implementation
  5. Performance Properties of Symbol Table Methods
  6. Performance: SequentialSearchST
  7. Performace: BinarySearchST
  8. Summary Method Performance Comparison
  9. Application Performance
  10. Comparable Keys
  11. A "Nearest" key Problem
  12. Orered Symbol Table
  13. Ordered Symbol Table Methods
  14. The floor and ceiling Methods
  15. The rank and select Methods
  16. Preview
  17. Preview Symbol Table Implementations
  18. Practice Problems for Quiz 2
  19. BinarySearchST rank method
  20. Binary Search Tree
  21. Binary Search Tree Properties
  22. BST class
  23. The Node class can be static
  24. The get operation on a BST
  25. findMax() and findMin()
  26. The put Method
  27. Iterative version of insert
  28. Deletion from a BST
  29. Problems

Application in Java[1] [top]


public class MaxFreq
{
  public static void main(String[] args)
  {
    Scanner in = MyIO.openInput("text.txt");
    ST<String, Integer> st = new ST<String, Integer>();

    while(in.hasNext()) {
      String w = in.next();
      Integer n = st.get(w);
      if (n == null) {
	st.put(w, 1);
      } else {
	st.put(w, n + 1);
      }
    }

    int max = 0;
    String maxWord = "";
    Iterator<String> p = st.keys().iterator();

    while(p.hasNext()) {
      String k = p.next();
      int cnt = st.get(k);
      if (cnt > max) {
	maxWord = k;
	max = cnt;
      }
    }
    System.out.println("Maximum frequency word: %s, frequency = %d\n", maxWord, max);
  }
}

Timing[2] [top]

Java uses Just-In-Time compiling to translate "frequently" executed byte code to native machine code.

If a program uses the doubling experiment to execute the same code for input size N and repeat for size 2N, it could run faster for the larger size because of just-in-time translation.

To turn off just-in-time compiling (jit) the following option can be passed to the java virtual machine (jvm), that is, to the java program.

 -Djava.compiler=none      
    

In Eclipse, you can do this for all projects:

  Windows > Preferences > Java > Installed JREs      
    

Select the Java Run Time Environment (probably only one) and select Edit

In the box for Default VM arguments enter the option:

 -Djava.compiler=none      
    

To turn jit back on, just repeat the steps above and delete this option.

Best Practice for Keys[3] [top]

Best Practices
  1. Make sure the equals method for the Key type tests for equality as you expect.
  2. If possible the Key type should be immutable
equals

Since searching for a key in a symbol table uses equality, problems occur if the equals method for the Key type is too strict.

Not every Java class overrides the equals method inherited from Object. So some classes use Object's equals method which IS too strict.

The equals method in Object is almost always too strict: x.equals(y) is true for Object's equals only if x and y reference the same object; that is, only if x == y.

Immutable Keys

If the Key type has methods that can change a key's state (i.e., Key type is NOT immutable) the key in some (key, value) pair in the symbol table can be changed to another key already in the symbol table. This would violate the rule that keys can't be duplicated.

If Key type is immutable, this can't occur.

See the code examples

A Symbol Table Implementation[4] [top]

SequentialSearchST

SequentialSearchST from the text is a class that implements the symbol table methods by storing (key, value) pairs in an unordered linked list.

The Node class used for building the linked list contain members for the key and for the value in addition to links to the next (and possibly the previous) Node.

BinarySearchST

BinarySearchST is another class in the text that also implements the symbol table methods by storing the (key, value) pairs in two arrays - one for the keys and one for the values.

      Key[] keys;
      Value[] values;
    

These two arrays are logically related through the array indices: the key at keys[i] has corresponding value values[i].

The keys array is kept in sorted order!

This means the Key type must implement Comparable.

Instead of using equals method to search for keys, the compareTo method is used.

Performance Properties of Symbol Table Methods[5] [top]

For an application that uses a symbol table, the two implementations are interchangeable as far as correctness goes provided the Key type implements Comparable.

But the methods have different performace times for the two implementations.

For each method which class has the faster method?

How does each method do its task?

Method SequentialSearchST BinarySearchST
put(k, v)    
v = get(k)    
delete(k)    

Performance: SequentialSearchST[6] [top]

Performace: BinarySearchST[7] [top]

Summary Method Performance Comparison[8] [top]

For Worst Case Performance

Method SequentialSearchST BinarySearchST
put(k, v) O(N) O(N)
v = get(k) O(N) O(log(N))
delete(k) O(N) O(N)

Application Performance[9] [top]

Usually O(log(N)) or O(N) is great, but that was the performance for one execution each method.

An application that prints the frequency of occurrence of each word in a document with N total words and M distinct words, must do N get operations, plus N put operations to insert the words into a symbol table.

Then it must iterate through the M distinct keys and do M more get operations.

Comparing the two classes just for building the symbol table:

class cost for insertion Total
SequentialSearchST N gets: N * O(N) = O(N2)
N puts: N * O(N) = O(N2)
total: O(N2) + O(N2)
= O(N2)
BinarySearchSt N gets: N * O(log(N)) = O(Nlog(N))
N puts: N * O(N) = O(N2)
total: O(Nlog(N)) + O(N2)
=O(N2)

See code examples

Comparable Keys[10] [top]

Often the Key type has a natural order - the type implements the compareTo method. Some applications need to find the "nearest" key in the symbol table when the search key is not actually in the table.

A "Nearest" key Problem[11] [top]

Records exist containing the Dow Jones Industrial Average closing price for each date the stock exchange was open.

A symbol table could be constructed with key = date and value = closing price.

An interactive application could use this table simply to report the closing price for any date input. But if the date (the key) is not in the table, it would be nice for the application to report the "next" date the exchange was open and the closing price for that date instead of simply reporting that the exchange was closed on the original input date.

Orered Symbol Table[12] [top]

The symbol table methods presented above don't provide any way of accomplishing this. In particular the compareTo method can't be used on the generic type Key since it isn't declared to implement Comparable<Key>.

Requiring the Key type to be Comparable is a restriction. So instead a new abstract data type is needed: ordered symbol table that requires Key to be a comparable type.

With this restriction, additional methods make sense and can be used to address problems such as finding the "nearest" key.

Ordered Symbol Table Methods[13] [top]

The ordered symbol table has all the same methods as the symbol table with unordered keys (in gray).

The additions to the unordered symbol table are

    public class ST<Key extends Comparable<Key>, Value> {
       

        public void put( Key key,  Value val)  {...}
        public Value get( Key key) {...}
        public void delete( Key key)  {...}
        public boolean contains(Key key) {...}
        public int size() {...}
        public boolean isEmpty() {...}
        public Iterable<Key> keys() {...}       


        public Key min() {...}
        public Key max() {...}
        public void deleteMin() {...}
        public void deleteMax() {...}     
      

        public Key floor( Key key) {...}
        public Key ceiling( Key key) {...}      
      

        public int rank( Key key) {...}
        public Key select( int k) {...}      
    }

The floor and ceiling Methods[14] [top]

For an ordered symbol table st, the floor and ceiling methods have the following specifications:

Key floor(Key k)

If the key k is in st, then st.floor(k) should return k

If the key k is NOT in st, st.floor(k) should return

Key ceiling(Key k)

If the key k is in st, then st.ceiling(k) should return k

If the key k is NOT in st, st.ceiling(k) should return

The rank and select Methods[15] [top]

For an ordered symbol table st, the rank and select methods provide access through the position of the keys in the natural ordering.

int rank(Key k)

st.rank(k) should return the number of keys in st that are less than k.

If st.rank(k) returns 0, is the key k contained in st or not? What if st.rank(k) returns 1? What if st.rank(k) returns n where n = st.size()?

Key select(int i)

st.select(i) should return

See code examples

Preview[16] [top]

The SequentialSearchST and BinarySearchST classes are implementations of the symbol table and ordered symbol table, respectively.

Unfortunately, the performance is not good enough to handle clients with a large number of keys.

Additional implementations based on other non-linear structures will effectively overcome this by having methods with improved order of growth properties.

The symbol tables will achieve this better performance by using better private data members.

Preview Symbol Table Implementations[17] [top]

Underlying Data Structure Implementing Class Advantages Disadvantages
linked list SequentialSearchST Ok for small number of keys Unacceptable performace for large number of keys
array BinarySearchST Good search performance
implements ordered symbol table
Insert (put) is too slow
binary search tree BST Very good search and insert provided keys inserted random order
relatively easy to implement
implements ordered symbol table
Can degenerate and be as bad as linked list
balanced binary search tree RedBlackST or AvlST Guarantees good search and insert
implements ordered symbol table
methods are more complicated to implement.
hash table SeparateChainHashST Provides even better search and insert performance Does NOT implement ordered symbol table
Requires "good" hash function exist for the Key type.

Practice Problems for Quiz 2[18] [top]

  1. BinarySearchST rank method
  2. Insertion in a bst and tree height

BinarySearchST rank method[19] [top]

    1	
    2	public int rank(final Key key) {
    3	    int lo = 0, hi = N-1;
    4	    while (lo <= hi) {
    5	      int mid = lo + (hi - lo) / 2;
    6	      int cmp = key.compareTo(keys[mid]);
    7	      if (cmp < 0) {
    8	          hi = mid - 1;
    9	      } else if (cmp > 0) {
   10	          lo = mid + 1;
   11	      } else {
   12	        return mid;
   13	      }
   14	    }
   15	    return lo;
   16	}

Notes:

Binary Search Tree[20] [top]

The BinarySearchST class has good performance for searching: get is O(log(N)).

But, put is still O(N). Why?

Because the keys are in an sorted array. To keep the array sorted on each put operation requires moving N existing items in the worst case.

A binary search tree links nodes together and avoids this moving.

However, to keep good performance for get as well as put, the keys must still be ordered somehow and allow quick access.

A sorted linked list doesn't allow quick access (e.g. to the middle element), but a binary search tree will.

Binary Search Tree Properties[21] [top]

A binary search tree has a root.

Each Node in the tree has a left and a right child link, either of which can be null.

At each Node the key in the Node is greater than all the keys in its left subtree (if not empty) and less than all the keys in its right subtree (if not empty).

BST class[22] [top]

The declaration of a generic binary search tree class should declare the generic parameters, Key and Value, so that elements of type Key are comparable:

Non-static inner Node class

public class BST<Key extends Comparable<Key>, Value>
{

  private Node root;

  private class Node
  {
    private Key key;
    private Value val;
    private Node left;
    private Node right;
  
    public Node(Key k, Value v)
    {
      key = k;
      val = v;
      left = right = null;
    }

    public Node()
    {
      this(null);
    }	  
  }
}

The Node class can be static[23] [top]

A static inner class cannot access the (non-static) members of the containing class. But Node doesn't need to access these members - e.g. root. So Node could be static:

However, that means a static inner Node class has no access to the generic type parameters declared by the containing BST class.

So a static inner Node class must itself declare generic parameters:


public class BST<Key extends Comparable<Key>, Value>
{

  private Node<Key, Value> root;

  private class Node<K, V>
  {
    private K key;
    private V val;
    private Node<K,> left;
    private Node<K,V> right;
  
    public Node(K k, V v)
    {
      key = k;
      val = v;
      left = right = null;
    }

    public Node()
    {
      this(null);
    }	  
  }
}

The get operation on a BST[24] [top]

It is often convenient to write the operations on trees as recursive methods. This is because the tree is itself defined recursively.

These recursive methods generaly need to have a Node as a parameter!

Consequently, these methods can't be public.

So the pattern that arises is to have one public method that doesn't require a Node parameter and to have a private recursive method that is called to do all the work.

For example, here is a pair of get methods, one public and one private:

public Value get(Key k)
{
  // Call the private recursive find 
  // starting at the root of the tree.
  return get( k, root);
}

private Value get(Key k, Node t)
{
  if ( t == null ) {
      return null;
  }
  if ( k.compareTo(t.key) < 0 ) {
      return find(x, t.left);
  }
  else if ( k.compareTo(t.key) > 0 ) {
      return find(x, t.right);
  }
  else  {
      return t.key;
  }
}

findMax() and findMin()[25] [top]

Where is the maximum data element in a BST? the minimum data element?

These methods can easily be implemented either recursively or iteratively (using a loop) to move from the root to the largest (respectively, smallest) data element in the BST.

The put Method[26] [top]

Insertion into a BST is very much like the get method.

public void put(Key k, Value v)

That is, move through the tree as if you were searching for k. When a null reference is reached, attach a new node containing k and v.

A recursive version again uses a pair of methods. The public method return type is void, but the private method returns a reference to the root Node of the modified subtree into which the data was inserted:

  public void put(Key k, Value v)
  {
    root = put(k, v, root);
  }

  private Node put(Key k, Value v, Node t)
  {
    if ( t == null )
	t = new Node(k,v);
    else if ( k.compareTo(t.key) < 0 ) {
	t.left = insert(x, t.left);
    }
    else if ( k.compareTo(t.key) > 0 ) {
	t.right = insert(x, t.right);
    }
    else {
	// k is already in the tree;
        // update k's value
        t.val = v;
    }
    return t;
  }

Iterative version of insert[27] [top]

Here is an iterative version of the publie put method (no private method used):

  public void put(Key k, Value v)
  {
    if (root == null) {
       root = new Node(k,v);
       return;
    }
    Node p = root;
    while(true)
    {
      if (k.compareTo(p.key) < 0 ) {
	if ( p.left == null ) {
	  p.left = new Node(k,v);
	  return;
	}
	else
	  p = p.left;
      }
      else if ( k.compareTo(p.key) > 0 ) {
	if ( p.right == null ) {
	  p.right = new Node(k,v);
	  return;
	}
	else
	  p = p.right;
      }
      else { // k already in the tree, update its value
	p.val = v;
	return;
      }
    }
  }

Deletion from a BST[28] [top]

As the code above reveals, insertion into a BST always occurs at a previously null link.

Deletion from a BST is more complicated. For example, what should happen if the data at the root is deleted?

Deletion is handled by considering three essential cases:

1. The BST is empty. 
	- Easy. Do nothing.

2. The data in Node t to be deleted has a null child (so 1 or 0 children)
	- Make the parent of t point to t's other child

3. The data in Node t to be deleted has two children. 
	- Find t's successor Node s (use private s min(t.right) )
	- Delete Node s from the subtree t.right (s has at most one child)
        - Replace t with s by setting s.right to the modified right
          subtree of t and setting s.left to t's left subtree.

Problems[29] [top]

Look at these problems for next class (not turned in)