public class MaxFreq
{
public static void main(String[] args)
{
Scanner in = MyIO.openInput("text.txt");
ST<String, Integer> st = new ST<String, Integer>();
while(in.hasNext()) {
String w = in.next();
Integer n = st.get(w);
if (n == null) {
st.put(w, 1);
} else {
st.put(w, n + 1);
}
}
int max = 0;
String maxWord = "";
Iterator<String> p = st.keys().iterator();
while(p.hasNext()) {
String k = p.next();
int cnt = st.get(k);
if (cnt > max) {
maxWord = k;
max = cnt;
}
}
System.out.println("Maximum frequency word: %s, frequency = %d\n", maxWord, max);
}
}
Java uses Just-In-Time compiling to translate "frequently"
executed byte code to native machine code.
If a program uses the doubling
experiment to execute the same code for input size N and repeat
for size 2N, it could run faster for the larger size because of
just-in-time translation.
To turn off just-in-time compiling (jit) the following option
can be passed to the java virtual machine (jvm), that is, to the
java program.
-Djava.compiler=none
In Eclipse, you can do this for all projects:
Windows > Preferences > Java > Installed JREs
Select the Java Run Time Environment (probably only one) and
select Edit
In the box for Default VM arguments
enter the option:
-Djava.compiler=none
To turn jit back on, just repeat the steps above and delete
this option.
Best Practices
- Make sure the equals method for the Key type tests
for equality as you expect.
- If possible the Key type should be immutable
equals
Since searching for a key in a
symbol table uses equality, problems occur if the
equals method for the Key type is too strict.
Not every Java class overrides the equals method
inherited from Object. So some classes use Object's
equals method which IS too strict.
The equals method in Object is
almost always too strict: x.equals(y) is true for
Object's equals only if x and y reference the same object;
that is, only if x == y.
Immutable Keys
If the Key type has methods that can change a key's state
(i.e., Key type is
NOT immutable) the key in some (key, value) pair in the symbol
table can be changed to another key already in the symbol table. This would violate the rule that
keys can't be duplicated.
If Key type is immutable, this can't occur.
See the code examples
SequentialSearchST
SequentialSearchST from the text is a class that implements the symbol table
methods by storing (key, value) pairs in an unordered linked
list.
The Node class used for building the linked list contain
members for the key and for the value in addition to links to the
next (and possibly the previous) Node.
BinarySearchST
BinarySearchST is another class in the text that also
implements the symbol table methods by storing the (key, value)
pairs in two arrays - one for the keys and one for the values.
Key[] keys;
Value[] values;
These two arrays are logically related through the array indices: the key at
keys[i] has corresponding value values[i].
The keys array is kept in sorted order!
This means the Key type must implement Comparable.
Instead of using equals method to search for keys, the
compareTo method is used.
For an application that uses a symbol table, the two
implementations are interchangeable as far as correctness goes
provided the Key type implements Comparable.
But the methods have different performace times for the two
implementations.
For each method which class has the faster method?
How does each method do its task?
| Method |
SequentialSearchST |
BinarySearchST |
| put(k, v) |
|
|
| v = get(k) |
|
|
| delete(k) |
|
|
- put(k,v): Examine each key in the list until k is found,
then update the value or
until the entire list has been searched then just insert a new Node with k and v at the
beginning of the list. Time: O(N) in worst case.
- v = get(k): In worst case, must search through all N Nodes
when the symbol table contains N keys. Time: O(N)
- delete(k): Same as get in worst case to find a Node with key equal to
k. Then deleting the Node is only changing a few links. Time O(N)
- put(k,v): Binary search for a key equal to k (time
O(log(N))), but then to insert a new key all larger keys (and
values) must be copied to the next index. In worst case N keys
and values must be copied. Time: O(N)
- v = get(k): Binary search. Time: O(log(N))
- delete(k): Similar to put. Binary search for k (time
O(log(N))), but keys greater than k must be copied to previous
index. Time: O(N)
For Worst Case Performance
| Method |
SequentialSearchST |
BinarySearchST |
| put(k, v) |
O(N) |
O(N) |
| v = get(k) |
O(N) |
O(log(N)) |
| delete(k) |
O(N) |
O(N) |
Usually O(log(N)) or O(N) is great, but that was the
performance for one execution each method.
An application that prints the frequency of occurrence of each
word in a document with N total words and M distinct words, must do N get operations,
plus N put operations to insert the words into a symbol table.
Then it must iterate through the M distinct keys and do M more
get operations.
Comparing the two classes just for building the symbol table:
| class |
cost for insertion |
Total |
| SequentialSearchST |
N gets: N * O(N) = O(N2) N puts: N * O(N)
= O(N2)
total: O(N2) + O(N2) |
= O(N2) |
| BinarySearchSt |
N gets: N * O(log(N)) = O(Nlog(N))
N puts: N * O(N) = O(N2)
total: O(Nlog(N)) + O(N2) |
=O(N2) |
See code examples
Often the Key type has a natural order - the type implements the
compareTo method. Some applications need to find the "nearest" key
in the symbol table when the search key is not actually in the table.
Records exist containing the Dow Jones Industrial Average
closing price for each date the stock exchange was open.
A symbol table could be constructed with key = date and value =
closing price.
An interactive application could use this table simply to
report the closing price for any date input. But if the date (the
key) is not in the table, it would be nice for the application to
report the "next" date the exchange was open and the closing price
for that date instead of simply reporting that the exchange was
closed on the original input date.
The symbol table methods presented above don't provide any way of
accomplishing this. In particular the compareTo method can't be
used on the generic type Key since it isn't declared to implement
Comparable<Key>.
Requiring the Key type to be Comparable is a restriction. So
instead a new abstract data type is needed: ordered symbol table
that requires Key to be a comparable type.
With this restriction, additional methods make sense and can be
used to address problems such as finding the "nearest" key.
The ordered symbol table has all the same methods as the symbol
table with unordered keys (in gray).
The additions to the unordered symbol table are
- The generic Key type is now declared to be
Comparable<Key>
- The obvious methods managing the minimum and maximum keys
(in blue)
- Methods floor and ceiling, useful for finding "nearest" key
(in red)
- Methods rank and select, useful for finding the position of
key in the ordering or finding key at a given
position.(in green)
public class ST<Key extends Comparable<Key>, Value> {
public void put( Key key, Value val) {...}
public Value get( Key key) {...}
public void delete( Key key) {...}
public boolean contains(Key key) {...}
public int size() {...}
public boolean isEmpty() {...}
public Iterable<Key> keys() {...}
public Key min() {...}
public Key max() {...}
public void deleteMin() {...}
public void deleteMax() {...}
public Key floor( Key key) {...}
public Key ceiling( Key key) {...}
public int rank( Key key) {...}
public Key select( int k) {...}
}
For an ordered symbol table st, the floor and ceiling methods
have the following specifications:
Key floor(Key k)
If the key k is in st, then st.floor(k) should return k
If the key k is NOT in st, st.floor(k) should return
- null if k is smaller than the minimum key, otherwise
- the largest key kf in st that is smaller than k
Key ceiling(Key k)
If the key k is in st, then st.ceiling(k) should return k
If the key k is NOT in st, st.ceiling(k) should return
- null if k is larger than the maximum key, otherwise
- the smallest key kc in st that is larger than k
For an ordered symbol table st, the rank and select methods
provide access through the position of the keys in the natural
ordering.
int rank(Key k)
st.rank(k) should return the number of keys in st that are less
than k.
If st.rank(k) returns 0, is the key k contained in st or
not? What if st.rank(k) returns 1? What if st.rank(k) returns n
where n = st.size()?
Key select(int i)
st.select(i) should return
- null if i < 0 or i >= st.size(), otherwise
- the key k in st such that there are i keys in st smaller
than k; i.e., such that rank(k) is i
See code examples
The SequentialSearchST and BinarySearchST classes are
implementations of the symbol table and ordered symbol table,
respectively.
Unfortunately, the performance is not good enough to handle
clients with a large number of keys.
Additional implementations based on other non-linear structures
will effectively overcome this by having methods with improved
order of growth properties.
The symbol tables will achieve this better performance by
using better private data members.
| Underlying Data Structure |
Implementing Class |
Advantages |
Disadvantages |
| linked list |
SequentialSearchST |
Ok for small number of keys |
Unacceptable performace for large number of keys |
| array |
BinarySearchST |
Good search performance implements ordered symbol table |
Insert (put) is too slow |
| binary search tree |
BST |
Very good search and insert provided keys inserted random
order relatively easy to implement implements ordered symbol table |
Can degenerate and be as bad as linked list |
| balanced binary search tree |
RedBlackST or AvlST |
Guarantees good search and insert implements ordered
symbol table |
methods are more complicated to implement. |
| hash table |
SeparateChainHashST |
Provides even better search and insert
performance |
Does NOT implement ordered symbol table Requires
"good" hash function exist for the Key type. |
- BinarySearchST rank method
- Insertion in a bst and tree height
1
2 public int rank(final Key key) {
3 int lo = 0, hi = N-1;
4 while (lo <= hi) {
5 int mid = lo + (hi - lo) / 2;
6 int cmp = key.compareTo(keys[mid]);
7 if (cmp < 0) {
8 hi = mid - 1;
9 } else if (cmp > 0) {
10 lo = mid + 1;
11 } else {
12 return mid;
13 }
14 }
15 return lo;
16 }
Notes:
- Line 5 is same as: mid = (lo + hi) / 2;
- Line 15: lo is returned!
The BinarySearchST class has good performance for searching:
get is O(log(N)).
But, put is still O(N). Why?
Because the keys are in an sorted array. To keep the array
sorted on each put operation requires moving N existing items in
the worst case.
A binary search tree links nodes together and avoids this
moving.
However, to keep good performance for get as well as put, the
keys must still be ordered somehow and allow quick access.
A sorted linked list doesn't allow quick access (e.g. to the
middle element), but a binary search tree will.
A binary search tree has a root.
Each Node in the tree has a left and a right child link, either
of which can be null.
At each Node the key in the Node is greater than all the keys
in its left subtree (if not empty) and less than all the keys in its
right subtree (if not empty).
The declaration of a generic binary search tree class should
declare the generic parameters, Key and Value, so that elements of type Key are
comparable:
Non-static inner Node class
public class BST<Key extends Comparable<Key>, Value>
{
private Node root;
private class Node
{
private Key key;
private Value val;
private Node left;
private Node right;
public Node(Key k, Value v)
{
key = k;
val = v;
left = right = null;
}
public Node()
{
this(null);
}
}
}
A static inner class cannot access the (non-static) members of
the containing class. But Node doesn't need to access these
members - e.g. root. So Node could be static:
However, that means a static inner Node class has no access to
the generic type parameters declared by the containing BST
class.
So a static inner Node class must itself declare generic
parameters:
public class BST<Key extends Comparable<Key>, Value>
{
private Node<Key, Value> root;
private class Node<K, V>
{
private K key;
private V val;
private Node<K,> left;
private Node<K,V> right;
public Node(K k, V v)
{
key = k;
val = v;
left = right = null;
}
public Node()
{
this(null);
}
}
}
It is often convenient to write the operations on trees as
recursive methods. This is because the tree is itself defined
recursively.
These recursive methods generaly need to have a Node as a parameter!
Consequently, these methods can't be public.
So the pattern that arises is to have one public method that doesn't
require a Node parameter and to have a private recursive method that
is called to do all the work.
For example, here is a pair of get methods, one public and
one private:
public Value get(Key k)
{
// Call the private recursive find
// starting at the root of the tree.
return get( k, root);
}
private Value get(Key k, Node t)
{
if ( t == null ) {
return null;
}
if ( k.compareTo(t.key) < 0 ) {
return find(x, t.left);
}
else if ( k.compareTo(t.key) > 0 ) {
return find(x, t.right);
}
else {
return t.key;
}
}
Where is the maximum data element in a BST? the minimum data element?
These methods can easily be implemented either recursively or
iteratively (using a loop) to move from the root to the largest
(respectively, smallest) data element in the BST.
Insertion into a BST is very much like the get method.
public void put(Key k, Value v)
That is, move through the tree as if you were searching for
k. When a null reference is reached, attach a new node
containing k and v.
A recursive version again uses a pair of methods. The public method
return type is void, but the private method returns a reference to the
root Node of the modified subtree into which the data was inserted:
public void put(Key k, Value v)
{
root = put(k, v, root);
}
private Node put(Key k, Value v, Node t)
{
if ( t == null )
t = new Node(k,v);
else if ( k.compareTo(t.key) < 0 ) {
t.left = insert(x, t.left);
}
else if ( k.compareTo(t.key) > 0 ) {
t.right = insert(x, t.right);
}
else {
// k is already in the tree;
// update k's value
t.val = v;
}
return t;
}
Here is an iterative version of the publie put method (no
private method used):
public void put(Key k, Value v)
{
if (root == null) {
root = new Node(k,v);
return;
}
Node p = root;
while(true)
{
if (k.compareTo(p.key) < 0 ) {
if ( p.left == null ) {
p.left = new Node(k,v);
return;
}
else
p = p.left;
}
else if ( k.compareTo(p.key) > 0 ) {
if ( p.right == null ) {
p.right = new Node(k,v);
return;
}
else
p = p.right;
}
else { // k already in the tree, update its value
p.val = v;
return;
}
}
}
As the code above reveals, insertion into a BST always occurs at a
previously null link.
Deletion from a BST is more complicated. For example, what should happen if
the data at the root is deleted?
Deletion is handled by considering three essential cases:
1. The BST is empty.
- Easy. Do nothing.
2. The data in Node t to be deleted has a null child (so 1 or 0 children)
- Make the parent of t point to t's other child
3. The data in Node t to be deleted has two children.
- Find t's successor Node s (use private s min(t.right) )
- Delete Node s from the subtree t.right (s has at most one child)
- Replace t with s by setting s.right to the modified right
subtree of t and setting s.left to t's left subtree.
Look at these problems for next class (not turned in)
- 3.1.8
- 3.2.1
- 3.2.2
- 3.2.3
- 3.2.6
- 3.2.14 (min)