previous | start | next

Clusters

The mod operator does a resonably good job of distributing random keys and this helps avoid collisions.

When collisions occur, clusters can form so that instead of being able to put the key at the next position, many positions must be skipped over because they are occupied by keys with a different hash value.

There are several techniques that seem to help to avoid clustering to some extent:

        If a collision occurs for a key with hash(key) = k, the positions to try are
        (mod table size, of course):

        Method            Next Position
        linear probing          k + 1, k + 2  k + 3, ...
        quadratic probing       k + 1, k + 4, k + 9, ...
        double hashing          k + j, k + 2j, k + 3j ...
        (A second hash function, hash2 is used 
        to compute the skip value, j = hash2(key) ) 
     

There are problems with all these.

Linear probing can cause clusters.

Quadratic probing avoids primary clustering, but can fail to find a spot if the table is more than 50% full.

Double hashing requires a second hash function and so is likely to be a bit slower to compute than quadratic probing.



previous | start | next