What does "bucket entries" mean in the context of a hashtable?

Question

What does "bucket entries" mean in the context of a hashtable?

Solution

A bucket is simply a fast-access location (like an array index) that is the the result of the hash function.

The idea with hashing is to turn a complex input value into a different value which can be used to rapidly extract or store data.

Consider the following hash function for mapping people's names into street addresses.

First take the initials from the first and last name and turn them both into numeric values (0 through 25, from A through Z). Multiply the first by 26 and add the second, and this gives you a value from 0 to 675 (26 * 26 distinct values, or bucket IDs). This bucket ID is then to be used to store or retrieve the information.


Now you can have a perfect hash (where each allowable input value maps to a distinct bucket ID) so that a simple array will suffice for the buckets. In that case, you can just maintain an array of 676 street addresses and use the bucket ID to find the one you want:

 ------------------- 
| George Washington | -> hash(GW)
 -------------------       |
                            -> GwBucket[George's address]
 ------------------- 
|  Abraham Lincoln  | -> hash(AL)
 -------------------       |
                            -> AlBucket[Abe's address]

However, this means that George Wendt and Allan Langer are going to cause problems in the future.


Or you can have an imperfect hash (such as one where John Smith and Jane Seymour would end up with the same bucket ID).

In that case, you need a more complex backing data structure than a simple array, to maintain a collection of addresses. This could be as simple as a linked list, or as complex as yet another hash:

 ------------         -------------- 
| John Smith |       | Jane Seymour |
 ------------         -------------- 
      |                     |
      V                     V
   hash(JS)              hash(JS)
      |                     |
       -----> JsBucket <---- 
                 \/
 ----------------------------------- 
| John Smith   ->  [John's address] |
| Jane Seymour ->  [Jane's address] |
 ----------------------------------- 

Then, as well as the initial hash lookup, an extra level of searching needs to be carried out within the bucket itself, to find the specific information.