DSII Week1Lec2


public class MaxFreq
{
  public static void main(String[] args)
  {
    Scanner in = MyIO.openInput("text.txt");
    ST<String, Integer> st = new ST<String, Integer>();

    while(in.hasNext()) {
      String w = in.next();
      Integer n = st.get(w);
      if (n == null) {
	st.put(w, 1);
      } else {
	st.put(w, n + 1);
      }
    }

    int max = 0;
    String maxWord = "";
    Iterator<String> p = st.keys().iterator();

    while(p.hasNext()) {
      String k = p.next();
      int cnt = st.get(k);
      if (cnt > max) {
	maxWord = k;
	max = cnt;
      }
    }
    System.out.println("Maximum frequency word: %s, frequency = %d\n", maxWord, max);
  }
}

Timing [2] [top]

Java uses Just-In-Time compiling to translate "frequently" executed byte code to native machine code.

If a program uses the doubling experiment to execute the same code for input size N and repeat for size 2N, it could run faster for the larger size because of just-in-time translation.

To turn off just-in-time compiling (jit) the following option can be passed to the java virtual machine (jvm), that is, to the java program.

 -Djava.compiler=none

In Eclipse, you can do this for all projects:

  Windows > Preferences > Java > Installed JREs

Select the Java Run Time Environment (probably only one) and select Edit

In the box for Default VM arguments enter the option:

 -Djava.compiler=none

To turn jit back on, just repeat the steps above and delete this option.

Best Practice for Keys [3] [top]

Best Practices

Make sure the equals method for the Key type tests for equality as you expect.
If possible the Key type should be immutable

equals

Since searching for a key in a symbol table uses equality, problems occur if the equals method for the Key type is too strict.

Not every Java class overrides the equals method inherited from Object. So some classes use Object's equals method which IS too strict.

The equals method in Object is almost always too strict: x.equals(y) is true for Object's equals only if x and y reference the same object; that is, only if x == y.

Immutable Keys

If the Key type has methods that can change a key's state (i.e., Key type is NOT immutable) the key in some (key, value) pair in the symbol table can be changed to another key already in the symbol table. This would violate the rule that keys can't be duplicated.

If Key type is immutable, this can't occur.

See the code examples

A Symbol Table Implementation [4] [top]

SequentialSearchST

SequentialSearchST from the text is a class that implements the symbol table methods by storing (key, value) pairs in an unordered linked list.

The Node class used for building the linked list contain members for the key and for the value in addition to links to the next (and possibly the previous) Node.

BinarySearchST

BinarySearchST is another class in the text that also implements the symbol table methods by storing the (key, value) pairs in two arrays - one for the keys and one for the values.

      Key[] keys;
      Value[] values;

These two arrays are logically related through the array indices: the key at keys[i] has corresponding value values[i].

The keys array is kept in sorted order!

This means the Key type must implement Comparable.

Instead of using equals method to search for keys, the compareTo method is used.

Performance Properties of Symbol Table Methods [5] [top]

For an application that uses a symbol table, the two implementations are interchangeable as far as correctness goes provided the Key type implements Comparable.

But the methods have different performace times for the two implementations.

For each method which class has the faster method?

How does each method do its task?

Method	SequentialSearchST	BinarySearchST
put(k, v)
v = get(k)
delete(k)

Performance: SequentialSearchST [6] [top]

put(k,v): Examine each key in the list until k is found, then update the value or until the entire list has been searched then just insert a new Node with k and v at the beginning of the list. Time: O(N) in worst case.
v = get(k): In worst case, must search through all N Nodes when the symbol table contains N keys. Time: O(N)
delete(k): Same as get in worst case to find a Node with key equal to k. Then deleting the Node is only changing a few links. Time O(N)

Performace: BinarySearchST [7] [top]

put(k,v): Binary search for a key equal to k (time O(log(N))), but then to insert a new key all larger keys (and values) must be copied to the next index. In worst case N keys and values must be copied. Time: O(N)
v = get(k): Binary search. Time: O(log(N))
delete(k): Similar to put. Binary search for k (time O(log(N))), but keys greater than k must be copied to previous index. Time: O(N)

Summary Method Performance Comparison [8] [top]

For Worst Case Performance

Method	SequentialSearchST	BinarySearchST
put(k, v)	O(N)	O(N)
v = get(k)	O(N)	O(log(N))
delete(k)	O(N)	O(N)

Application Performance [9] [top]

Usually O(log(N)) or O(N) is great, but that was the performance for one execution each method.

An application that prints the frequency of occurrence of each word in a document with N total words and M distinct words, must do N get operations, plus N put operations to insert the words into a symbol table.

Then it must iterate through the M distinct keys and do M more get operations.

Comparing the two classes just for building the symbol table:

class	cost for insertion	Total
SequentialSearchST	N gets: N * O(N) = O(N²) N puts: N * O(N) = O(N²) total: O(N²) + O(N²)	= O(N²)
BinarySearchSt	N gets: N * O(log(N)) = O(Nlog(N)) N puts: N * O(N) = O(N²) total: O(Nlog(N)) + O(N²)	=O(N²)

See code examples

Comparable Keys [10] [top]

Often the Key type has a natural order - the type implements the compareTo method. Some applications need to find the "nearest" key in the symbol table when the search key is not actually in the table.

A "Nearest" key Problem [11] [top]

Records exist containing the Dow Jones Industrial Average closing price for each date the stock exchange was open.

A symbol table could be constructed with key = date and value = closing price.

An interactive application could use this table simply to report the closing price for any date input. But if the date (the key) is not in the table, it would be nice for the application to report the "next" date the exchange was open and the closing price for that date instead of simply reporting that the exchange was closed on the original input date.

Orered Symbol Table [12] [top]

The symbol table methods presented above don't provide any way of accomplishing this. In particular the compareTo method can't be used on the generic type Key since it isn't declared to implement Comparable<Key>.

Requiring the Key type to be Comparable is a restriction. So instead a new abstract data type is needed: ordered symbol table that requires Key to be a comparable type.

With this restriction, additional methods make sense and can be used to address problems such as finding the "nearest" key.

Ordered Symbol Table Methods [13] [top]

The ordered symbol table has all the same methods as the symbol table with unordered keys (in gray).

The additions to the unordered symbol table are

The generic Key type is now declared to be Comparable<Key>
The obvious methods managing the minimum and maximum keys (in blue)
Methods floor and ceiling, useful for finding "nearest" key (in red)
Methods rank and select, useful for finding the position of key in the ordering or finding key at a given position.(in green)

    public class ST<Key extends Comparable<Key>, Value> {
       

        public void put( Key key,  Value val)  {...}
        public Value get( Key key) {...}
        public void delete( Key key)  {...}
        public boolean contains(Key key) {...}
        public int size() {...}
        public boolean isEmpty() {...}
        public Iterable<Key> keys() {...}       


        public Key min() {...}
        public Key max() {...}
        public void deleteMin() {...}
        public void deleteMax() {...}     
      

        public Key floor( Key key) {...}
        public Key ceiling( Key key) {...}      
      

        public int rank( Key key) {...}
        public Key select( int k) {...}      
    }

The floor and ceiling Methods [14] [top]

For an ordered symbol table st, the floor and ceiling methods have the following specifications:

Key floor(Key k)

If the key k is in st, then st.floor(k) should return k

If the key k is NOT in st, st.floor(k) should return

null if k is smaller than the minimum key, otherwise
the largest key kf in st that is smaller than k

Key ceiling(Key k)

If the key k is in st, then st.ceiling(k) should return k

If the key k is NOT in st, st.ceiling(k) should return

null if k is larger than the maximum key, otherwise
the smallest key kc in st that is larger than k

The rank and select Methods [15] [top]

For an ordered symbol table st, the rank and select methods provide access through the position of the keys in the natural ordering.

int rank(Key k)

st.rank(k) should return the number of keys in st that are less than k.

If st.rank(k) returns 0, is the key k contained in st or not? What if st.rank(k) returns 1? What if st.rank(k) returns n where n = st.size()?

Key select(int i)

st.select(i) should return

null if i < 0 or i >= st.size(), otherwise
the key k in st such that there are i keys in st smaller than k; i.e., such that rank(k) is i

See code examples

Preview [16] [top]

The SequentialSearchST and BinarySearchST classes are implementations of the symbol table and ordered symbol table, respectively.

Unfortunately, the performance is not good enough to handle clients with a large number of keys.

Additional implementations based on other non-linear structures will effectively overcome this by having methods with improved order of growth properties.

The symbol tables will achieve this better performance by using better private data members.

Preview Symbol Table Implementations [17] [top]

Underlying Data Structure	Implementing Class	Advantages	Disadvantages
linked list	SequentialSearchST	Ok for small number of keys	Unacceptable performace for large number of keys
array	BinarySearchST	Good search performance implements ordered symbol table	Insert (put) is too slow
binary search tree	BST	Very good search and insert provided keys inserted random order relatively easy to implement implements ordered symbol table	Can degenerate and be as bad as linked list
balanced binary search tree	RedBlackST or AvlST	Guarantees good search and insert implements ordered symbol table	methods are more complicated to implement.
hash table	SeparateChainHashST	Provides even better search and insert performance	Does NOT implement ordered symbol table Requires "good" hash function exist for the Key type.

Practice Problems for Quiz 2 [18] [top]

BinarySearchST rank method
Insertion in a bst and tree height

BinarySearchST rank method [19] [top]

    1	
    2	public int rank(final Key key) {
    3	    int lo = 0, hi = N-1;
    4	    while (lo <= hi) {
    5	      int mid = lo + (hi - lo) / 2;
    6	      int cmp = key.compareTo(keys[mid]);
    7	      if (cmp < 0) {
    8	          hi = mid - 1;
    9	      } else if (cmp > 0) {
   10	          lo = mid + 1;
   11	      } else {
   12	        return mid;
   13	      }
   14	    }
   15	    return lo;
   16	}

Notes:

Line 5 is same as: mid = (lo + hi) / 2;
Line 15: lo is returned!

Binary Search Tree [20] [top]

The BinarySearchST class has good performance for searching: get is O(log(N)).

But, put is still O(N). Why?

Because the keys are in an sorted array. To keep the array sorted on each put operation requires moving N existing items in the worst case.

A binary search tree links nodes together and avoids this moving.

However, to keep good performance for get as well as put, the keys must still be ordered somehow and allow quick access.

A sorted linked list doesn't allow quick access (e.g. to the middle element), but a binary search tree will.

Binary Search Tree Properties [21] [top]

A binary search tree has a root.

Each Node in the tree has a left and a right child link, either of which can be null.

At each Node the key in the Node is greater than all the keys in its left subtree (if not empty) and less than all the keys in its right subtree (if not empty).

BST class [22] [top]

The declaration of a generic binary search tree class should declare the generic parameters, Key and Value, so that elements of type Key are comparable:

Non-static inner Node class


public class BST<Key extends Comparable<Key>, Value>
{

  private Node root;

  private class Node
  {
    private Key key;
    private Value val;
    private Node left;
    private Node right;
  
    public Node(Key k, Value v)
    {
      key = k;
      val = v;
      left = right = null;
    }

    public Node()
    {
      this(null);
    }	  
  }
}

The Node class can be static [23] [top]

A static inner class cannot access the (non-static) members of the containing class. But Node doesn't need to access these members - e.g. root. So Node could be static:

However, that means a static inner Node class has no access to the generic type parameters declared by the containing BST class.

So a static inner Node class must itself declare generic parameters:


public class BST<Key extends Comparable<Key>, Value>
{

  private Node<Key, Value> root;

  private class Node<K, V>
  {
    private K key;
    private V val;
    private Node<K,> left;
    private Node<K,V> right;
  
    public Node(K k, V v)
    {
      key = k;
      val = v;
      left = right = null;
    }

    public Node()
    {
      this(null);
    }	  
  }
}

The get operation on a BST [24] [top]

It is often convenient to write the operations on trees as recursive methods. This is because the tree is itself defined recursively.

These recursive methods generaly need to have a Node as a parameter!

Consequently, these methods can't be public.

So the pattern that arises is to have one public method that doesn't require a Node parameter and to have a private recursive method that is called to do all the work.

For example, here is a pair of get methods, one public and one private:

public Value get(Key k)
{
  // Call the private recursive find 
  // starting at the root of the tree.
  return get( k, root);
}

private Value get(Key k, Node t)
{
  if ( t == null ) {
      return null;
  }
  if ( k.compareTo(t.key) < 0 ) {
      return find(x, t.left);
  }
  else if ( k.compareTo(t.key) > 0 ) {
      return find(x, t.right);
  }
  else  {
      return t.key;
  }
}

findMax() and findMin()[25] [top]

Where is the maximum data element in a BST? the minimum data element?

These methods can easily be implemented either recursively or iteratively (using a loop) to move from the root to the largest (respectively, smallest) data element in the BST.

The put Method [26] [top]

Insertion into a BST is very much like the get method.

public void put(Key k, Value v)

That is, move through the tree as if you were searching for k. When a null reference is reached, attach a new node containing k and v.

A recursive version again uses a pair of methods. The public method return type is void, but the private method returns a reference to the root Node of the modified subtree into which the data was inserted:

  public void put(Key k, Value v)
  {
    root = put(k, v, root);
  }

  private Node put(Key k, Value v, Node t)
  {
    if ( t == null )
	t = new Node(k,v);
    else if ( k.compareTo(t.key) < 0 ) {
	t.left = insert(x, t.left);
    }
    else if ( k.compareTo(t.key) > 0 ) {
	t.right = insert(x, t.right);
    }
    else {
	// k is already in the tree;
        // update k's value
        t.val = v;
    }
    return t;
  }

Iterative version of insert [27] [top]

Here is an iterative version of the publie put method (no private method used):

  public void put(Key k, Value v)
  {
    if (root == null) {
       root = new Node(k,v);
       return;
    }
    Node p = root;
    while(true)
    {
      if (k.compareTo(p.key) < 0 ) {
	if ( p.left == null ) {
	  p.left = new Node(k,v);
	  return;
	}
	else
	  p = p.left;
      }
      else if ( k.compareTo(p.key) > 0 ) {
	if ( p.right == null ) {
	  p.right = new Node(k,v);
	  return;
	}
	else
	  p = p.right;
      }
      else { // k already in the tree, update its value
	p.val = v;
	return;
      }
    }
  }

Deletion from a BST [28] [top]

As the code above reveals, insertion into a BST always occurs at a previously null link.

Deletion from a BST is more complicated. For example, what should happen if the data at the root is deleted?

Deletion is handled by considering three essential cases:

1. The BST is empty. 
	- Easy. Do nothing.

2. The data in Node t to be deleted has a null child (so 1 or 0 children)
	- Make the parent of t point to t's other child

3. The data in Node t to be deleted has two children. 
	- Find t's successor Node s (use private s min(t.right) )
	- Delete Node s from the subtree t.right (s has at most one child)
        - Replace t with s by setting s.right to the modified right
          subtree of t and setting s.left to t's left subtree.

CSC301 Apr08

slide version

single file version

Contents

Application in Java[1] [top]

Timing[2] [top]

Best Practice for Keys[3] [top]

Best Practices

equals

Immutable Keys

A Symbol Table Implementation[4] [top]

SequentialSearchST

BinarySearchST

Performance Properties of Symbol Table Methods[5] [top]

Performance: SequentialSearchST[6] [top]

Performace: BinarySearchST[7] [top]

Summary Method Performance Comparison[8] [top]

Application Performance[9] [top]

Comparable Keys[10] [top]

A "Nearest" key Problem[11] [top]

Orered Symbol Table[12] [top]

Ordered Symbol Table Methods[13] [top]

The floor and ceiling Methods[14] [top]

Key floor(Key k)

Key ceiling(Key k)

The rank and select Methods[15] [top]

int rank(Key k)

Key select(int i)

Preview[16] [top]

Preview Symbol Table Implementations[17] [top]

Practice Problems for Quiz 2[18] [top]

BinarySearchST rank method[19] [top]

Binary Search Tree[20] [top]

Binary Search Tree Properties[21] [top]

BST class[22] [top]