Hash Tables#
A hash table is a data structure that provides efficient insertion, deletion, and lookup operations on key-value pairs. It works by using a hash function to map each key to a position in an array, called the hash table, where the corresponding value is stored.
The hash function takes a key as input and generates a hash code, which is used to determine the position in the array where the value should be stored. Ideally, the hash function should distribute the keys uniformly across the hash table, to minimize collisions (i.e., when two or more keys map to the same position in the array).
To handle collisions, a hash table typically uses a collision resolution strategy, such as chaining or open addressing. Chaining involves storing all the values that hash to the same position in a linked list, while open addressing involves finding the next available position in the array to store the value.
One of the advantages of hash tables is their speed. In the average case, operations on a hash table have a constant-time complexity of
Overall, hash tables are an important data structure in computer science, and are widely used in applications such as databases, compilers, and web servers. However, the efficiency of hash tables depends on the quality of the hash function used, and collisions can still occur, which can degrade performance.
Storing data#
Hash Tables#
implements an associative array or dictionary
an abstract data type that maps keys to values
uses a hash function to compute an
, also called aat lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored.
Why not…#
Search
Insert/Delete, much more costly
Guarenteed
Best-case
Practical limitations
Extra space
A given integer in a programmming language may not store
digitsTherefore, not always a viable option
Hash Functions#
a function converting a piece of data into a smaller, more practical integer
the integer value is used as the
between 0 and for the data in the hash tableideally, maps all keys to a unique slot
in the tableperfect hash functions may be difficult, but not impossible to create
Efficiently computable
Should uniformly distribute the keys (each table position equally likely for each)
Should minimize collisions
Should have a low load factor
Modular Hashing#
Suppose there are six students:
Suppose
Uniform Hashing#
- Assumption#
Any key is equally likely (and independent of other keys) to hash to one of
possible indices- Bins and Balls#
Toss
balls uniformly at random into bins- Bad News [birthday problem]#
In a random group of 23 people, more likely than not that two people share the same birthday Expect two balls in the same bin after
- Good News#
when
, expect most bins to have balls when , expect most loaded bin has balls
Collisions#
Two distinct keys that hash to the same index birthday problem
load balancing
Separate Chaining#
keeps a list of all elements that hash to the same value
Performance
Load factor
Expected time to search or delete =
Time to insert =
Time complexity of search, insert, and delete is
Example
Advantages / Disadvantages??
Simple to implement.
Hash table never fills up, we can always add more elements to the chain.
Less sensitive to the hash function or load factors.
It is mostly used when it is unknown how many and how frequently keys may be inserted or deleted.
The cache performance of chaining is not good as keys are stored using a linked list. Open addressing provides better cache performance as everything is stored in the same table.
Wastage of Space (Some Parts of the hash table are never used)
If the chain becomes long, then search time can become O(n) in the worst case
Uses extra space for links
Open Addressing#
keeps a list of all elements that hash to the same value
If
If
If
… and so on
Quadratic Probing#
Double Hashing#
Rule
Example
Comparison#
Easy to implement
Best cache performance
Suffers from clustering
Average cache performance
Suffers less from clustering
Poor cache performance
No clustering
Requires more computation time