 # COMP 202: Principles of Object-Oriented Programming II

Dictionaries &amp; Hashing

## Dictionary

A major theme in computing is the theme of storage/retrieval/removal: store data somewhere so that it can later be retrieved and discarded if no longer needed, all of this in the most efficient manner.  The abstraction of these computing activities is embodied in the notion of what is called a dictionary, expressed in Java as an interface as follows.

IDictionay is an example what we call an unrestricted  access container (as opposed to restricted access container).  A simple way to implement IDictionary is to use an LRStruct.

The problem with such an implementation is that each of the operations, insert, lookup and remove takes O(N) time, where N is the total number of elements in the dictionary.  Using a self-balanced tree will guarantee O(logN) time.

Can we do better than that?  The answer is yes and no.  With an data structure called "hash table" coupled with an appropriate "hash function", we can achieve an amortized performance of O(1), that is constant time!

## Hash Tables and Hash Functions

### Hash Tables

• A hash table is a generalization of an ordinary array.
• When the number of keys actually stored is small relative to the total number of possible keys, hash tables become an effective alternative to directly addressing an array, since a hash table typically uses an array of size proportional to the number of keys actually stored.
• Instead of using the key as an array index directly, the array index is computed from the key.
• With hashing, an element with key k is stored in slot h(k); i.e., a hash function h is used to compute the slot from the key k.
• h maps the set U of keys into the slots of a hash table T[0..m-1]:
• h:U -> {0, 1, ..., m - 1}

### The Problem: Collisions

• Two keys may hash to the same slot.  This is called a collision.  Because |U|>m, collisions are unavoidable.
• To avoid collisions, h should appear ``random'', i.e., adjacent keys should not hash to adjacent slots.
• To cope with collisions, the simplest method is chaining.

### Chaining

• In chaining, we put all the elements that hash to the same slot in a linked list, i.e., slot j contains a reference to the head of the list of all stored elements that hash to j; if there are no such elements, slot j contains an empty list.
• To insert an element, we simply put it at the front of the list.  So, the worst case running time is O(1).
• To lookup an element, we search the list belonging to the slot for the corresponding key.  So, the worst case running time is proportional to the length of the list.
• Removal is identical to lookup.

### Performance

• Given a hash table with m slots that stores n elements, we define the load factor alpha as n/m, i.e., the average number of elements in a chain.
• The worst case behavior of hashing with chaining is O(n): All n keys hash to the same slot, creating a list of length n.
• The expected case behavior depends on how well the hash function distributes the set of keys to be stored among the m slots, on average.  We will assume that
1. any given element is equally likely to hash into any of the m slots and
2. the hash value can be computed in O(1) time.  Then the expected case search time is O(1+alpha).
• If the number of hash table slots is at least proportional to the number of elements in the table, we have n=O(m) and consequently, alpha = n/m = O(m)/m = O(1).  Thus, searching takes constant time on average.

### Hash Table Implementations

•  The Java Collection Framework provided by Java SDK, includes an "industrial strength" implementation of the hash table data structure called HashTable.
• Below is our own implementation of hash tables.  One of the main differences between the Sun's version and ours is the our elements() method returns an IList, while the Sun's hash table returns an Enumeration.  By returning an IList, we can process our hash table using visitors, while with an Enumeration, one will have to write loops to do the processing.  The choice is yours!
Dictionaries &amp; Hashing

URL: http://www.cs.rice.edu/teaching/202/08-fall/lectures/hash/index.shtml