CSC301 Apr01

slide version

single file version

Contents

  1. Abstract Data Types
  2. Advantages of ADTs
  3. Interfaces
  4. Interfaces versus Abstract Classes
  5. Iterator Interface
  6. Iterable Interface
  7. Mystery Example
  8. Symbol Table Definition
  9. Examples/Alternate Names
  10. Generic Classes
  11. Symbol Table ADT
  12. Symbol Table Conventions
  13. Requiring Keys Not be null
  14. Requiring Values not be null
  15. Application
  16. Application in Java
  17. Best Practice for Keys
  18. A Symbol Table Implementation
  19. Performance Properties of Symbol Table Methods
  20. Performance: SequentialSearchST
  21. Performace: BinarySearchST
  22. Summary Method Performance Comparison
  23. Application Performance

Abstract Data Types[1] [top]

What are abstract data types and what use are they?

Advantages of ADTs[2] [top]

Interfaces[3] [top]

What are Java interfaces and what good are they?

Interfaces versus Abstract Classes[4] [top]

Interfaces
Abstract Classes

A class is abstract if it has at least one abstract method. An method declared as abstract has no implementation. Implementation of abstract methods are the responsibility of subclasses.

Example

 public abstract class AbstractList<E>
 {
   ...
   public void add(int index, E element) { ... } // Not abstract; implemented
   ...
   public abstract E get(int index);  // Abstract; no implementation in this class
   ...
 }
    

Iterator Interface[5] [top]

For classes implementing abstract data types such as stacks, queues, lists, etc. that store collections of items, the Iterator interface provides a common way to access the items one at a time independent of the underlying implementation of the class.


public class Iterator<E>
{
  public boolean hasNext();
  public E next();
  public void remove(); 

}

The remove() method is "optional". T should throw UnsupportedOperationException if isn't implemented to actually remove an item.

But how do you get an Iterator for a Stack, Queue, or LinkedList?

Iterable Interface[6] [top]

A class that implements the Iterable interface must have a method, iterator() that returns an Iterator Object for that class instance

public interface Iterable<E>
{
  public Iterator<E> iterator();

}

Java classes Stack<E>, Queue<E>, and LinkedList<E> all implement Iterable<E>

So code to print the elements in any one of these can be exactly the same:

      Stack<String> x; // or Queue<String> x;  or LinkedList<String> x;
 
      ...
      Iterator<String> p = x.iterator();

      while(p.hasNext()) {
       System.out.println(p.next());
      }
    

Another advantage for a class implementing Iterable is that the 'foreach' style loop can be used. It is implemented by the compiler as the while loop above, but can be written more simply as:

      Stack<String> x; // or Queue<String> x;  or LinkedList<String> x;
 
      ...
      for(String s: x) {
       Systme.out.println(s);
      }      
    

Mystery Example[7] [top]


public class Mystery<E> implements Iterable<E>
{

   // ??? (contents of Mystery class not shown)
}

What can you conclude, if anything, about the methods in Mystery?

Symbol Table Definition[8] [top]

An ADT that can be used in many different client applications is the symbol table ADT

A symbol table (abstractly) stores (key, value) pairs and supports insertion, deletion and lookup the value corresponding to a key.

Such a data type can more easily help create correct solutions to many programming problems.

Examples/Alternate Names[9] [top]

Name Purpose Key Value
Dictionary Lookup word meaning word meaning
Book Index Find page(s) in a book where a word occurs word List of page numbers
File Index Find list of files that contain a given string string List of file names
Compiler lookup name usage program element names (variables, function names, class names, etc.) Lookup the usage of the name and its attributes

Generic Classes[10] [top]

In Java an identifier can be declared to represent a variable whose possible value is a type!

Such identifiers are called generic parameters.

An identifier must be declared to be a generic identifier.

Below line 2 declares E to be a generic parameter. The scope of E is from line 2 to the end of the class MyList at line 11.

Line 6 is a use of E, not a declaration of E.

If one or more generic parameters is declared in a class header (line 2), the class is said to be a generic class.

    1	
    2	public class MyList<E> // declaration of E
    3	{
    4	 
    5	  ...
    6	  public void add(E x) // Use of E
    7	  {
    8	
    9	  }
   10	
   11	}

Symbol Table ADT[11] [top]

public class ST<Key, Value>
{
  public void put(Key k, Value v) {...}
  public Value get(Key k) {...}
  public void delete(Key k) {...}
  public Iterable<Key> keys();
  public boolean contains(Key key) {...}
  public int size() {...}
  public boolean isEmpty() {...}
}

Question: How can you print all the key value pairs?

Symbol Table Conventions[12] [top]

Requiring Keys Not be null[13] [top]

All implementations of the Symbol Table methods (put(k,v), get(k), delete(k)) will need to to compare k with the at least some of keys stored in the Symbol Table.

How?

The obvious way to compare k with a key k1 stored in the symbol table:

      k.equals(k1) 
    

But if k == null this will throw a NullPointerException.

Requiring that keys in the symbol table not be null means put, get, and delete require k not be null and so they can always use the equals method for comparison.

Requiring Values not be null[14] [top]

The get(k) method returns null if k is not in the symbol table. If k is in the symbol table, get(k) should return the value associated with k in the symbol table.

If null values were allowed and get(k) returned null, it would mean either

  1. k is not in the symbol table or
  2. k is in the symbol table and its associate value is null

It would be necessary to use the contains(k) method to distinguish these two cases.

In many cases, the get method would have to repeat all the same work in searching for the key k that the contains method does.

Requiring that values not be null means that if get(k) returns null, there is only one possibility: k is NOT in the symbol table.

Several Java API classes that implement symbol table methods allow null keys and null values while others do not.

Application[15] [top]

Problem: For a text file, find the word that occurs most frequently.

Use a symbol table whose keys are words (String type) and whose value is the frequency of occurrence of each word (Integer type).

  1. Read the file and extract one word k at a time.
  2. Use the get(k) method to either determine that k is in the symbol table or not.
  3. If the word k is not in the symbol table, insert k with the value 1.
  4. If the word k was already in the symbol table, get its value, increment it, and put (k, updated value) back in the symbol table.

Application in Java[16] [top]


public class MaxFreq
{
  public static void main(String[] args)
  {
    Scanner in = MyIO.openInput("text.txt");
    ST<String, Integer> st = new ST<String, Integer>();

    while(in.hasNext()) {
      String w = in.next();
      Integer n = st.get(w);
      if (n == null) {
	st.put(w, 1);
      } else {
	st.put(w, n + 1);
      }
    }

    int max = 0;
    String maxWord = "";
    Iterator<String> p = st.keys().iterator();

    while(p.hasNext()) {
      String k = p.next();
      int cnt = st.get(k);
      if (cnt > max) {
	maxWord = k;
	max = cnt;
      }
    }
    System.out.println("Maximum frequency word: %s, frequency = %d\n", maxWord, max);
  }
}

Best Practice for Keys[17] [top]

Best Practices
  1. Make sure the equals method for the Key type tests for equality as you expect.
  2. If possible the Key type should be immutable
equals

Since searching for a key in a symbol table uses equality, problems occur if the equals method for the Key type is too strict.

Not every Java class overrides the equals method inherited from Object. So some classes use Object's equals method which IS too strict.

The equals method in Object is almost always too strict: x.equals(y) is true for Object's equals only if x and y reference the same object; that is, only if x == y.

Immutable Keys

If the Key type has methods that can change a key's state (i.e., Key type is NOT immutable) the key in some (key, value) pair in the symbol table can be changed to another key already in the symbol table. This would violate the rule that keys can't be duplicated.

If Key type is immutable, this can't occur.

See the code examples

A Symbol Table Implementation[18] [top]

SequentialSearchST

SequentialSearchST from the text is a class that implements the symbol table methods by storing (key, value) pairs in an unordered linked list.

The Node class used for building the linked list contain members for the key and for the value in addition to links to the next (and possibly the previous) Node.

BinarySearchST

BinarySearchST is another class in the text that also implements the symbol table methods by storing the (key, value) pairs in two arrays - one for the keys and one for the values.

      Key[] keys;
      Value[] values;
    

These two arrays are logically related through the array indices: the key at keys[i] has corresponding value values[i].

The keys array is kept in sorted order!

This means the Key type must implement Comparable.

Instead of using equals method to search for keys, the compareTo method is used.

Performance Properties of Symbol Table Methods[19] [top]

For an application that uses a symbol table, the two implementations are interchangeable as far as correctness goes provided the Key type implements Comparable.

But the methods have different performace times for the two implementations.

For each method which class has the faster method?

How does each method do its task?

Method SequentialSearchST BinarySearchST
put(k, v)    
v = get(k)    
delete(k)    

Performance: SequentialSearchST[20] [top]

Performace: BinarySearchST[21] [top]

Summary Method Performance Comparison[22] [top]

For Worst Case Performance

Method SequentialSearchST BinarySearchST
put(k, v) O(N) O(N)
v = get(k) O(N) O(log(N))
delete(k) O(N) O(N)

Application Performance[23] [top]

Usually O(log(N)) or O(N) is great, but that was the performance for one execution each method.

An application that prints the frequency of occurrence of each word in a document with N total words and M distinct words, must do N get operations, plus N put operations to insert the words into a symbol table.

Then it must iterate through the M distinct keys and do M more get operations.

Comparing the two classes just for building the symbol table:

class cost for insertion Total
SequentialSearchST N gets: N * O(N) = O(N2)
N puts: N * O(N) = O(N2)
total: O(N2) + O(N2)
= O(N2)
BinarySearchSt N gets: N * O(log(N)) = O(Nlog(N))
N puts: N * O(N) = O(N2)
total: O(Nlog(N)) + O(N2)
=O(N2)

See code examples