Hash Tables

Hash Tables#

Storing data#

../../_images/18_00.png — Fig. 66 Summary Table#

Hash Tables #

implements an associative array or dictionary
an abstract data type that maps keys to values
uses a hash function to compute an \(index\), also called a \(hash code\)
at lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored.

https://s3.ap-south-1.amazonaws.com/s3.studytonight.com/tutorials/uploads/pictures/1604593128-76844.png — Fig. 67 Hash Table#

Why not…#

Arrays & Linked Lists

https://miro.medium.com/max/970/1*f2oDQ0cdY54olxCFOIMIdQ.png

Search \(O(log\ n)\)
Insert/Delete, much more costly

Balanced BST

https://media.geeksforgeeks.org/wp-content/cdn-uploads/BinaryTree3-300x188.png

Guarenteed \(O(log\ n)\)

Direct Access Table

https://www.kindsonthegenius.com/wp-content/uploads/2020/09/Direct-Address-Table-1.jpg

Best-case \(O(1)\)
Practical limitations
- Extra space
- A given integer in a programmming language may not store \(n\) digits
- Therefore, not always a viable option

Hash Functions #

a function converting a piece of data into a smaller, more practical integer
the integer value is used as the \(index\) between 0 and \(m-1\) for the data in the hash table
ideally, maps all keys to a unique slot \(index\) in the table
perfect hash functions may be difficult, but not impossible to create

https://www.vladimircicovic.com/content/images/20200502181417-hash_function.jpg

Properties of good hash functions

Efficiently computable
Should uniformly distribute the keys (each table position equally likely for each)
Should minimize collisions
Should have a low load factor \(\frac{\#\ items\ in\ table}{table\ size}\)

Modular Hashing #

To uniformly create hashes, hash functions may use heuristic techniques of division or multiplication

Legend

\(h\) = hash function
\(x\) = key
\(HT\) = hash table
\(m\) = table size
\(b\) = buckets
\(r\) = items per bucket

Rules

\[\begin{split}\begin{align} 0 \le h(x) \lt m\ , or \\ 0 \le h(x) \lt b-1 \end{align}\end{split}\]

\[Entry\ lookup \Rightarrow HT[h(x)]\]

Syntax

\[\begin{split}\begin{align} h(x) & = x\ mod\ m \\ & = x\ \%\ m \end{align}\end{split}\]

Example

Suppose there are six students:

\(a1, a2, a3, a4, a5, a6\) in the Data Structures class and their IDs are:

\[\begin{split}\begin{align} a1: & 197354863; \\ a2: & 933185952; \\ a3: & 132489973; \\ a4: & 134152056; \\ a5: & 216500306; \\ a6: & 106500306. \\ \end{align}\end{split}\]

Hashing

\[h: \{k_1,k_2,k_3,k_4,k_5,k_6 \} \rightarrow \]

\[ \ \ \ \ \{0,1,2,...12\}\ by\ h(k_1) = k_1 \%\ 13\]

\[\begin{split}\begin{align} h(k_1) & = 197354863\ \ \%\ 13 = 4 \\ h(k_2) & = 933185952\ \ \%\ 13 = 10 \\ h(k_3) & = 132489973\ \ \%\ 13 = 5 \\ h(k_4) & = 134152056\ \ \%\ 13 = 12 \\ h(k_5) & = 216500306\ \ \%\ 13 = 9 \\ h(k_6) & = 106500306\ \ \%\ 13 = 3 \\ \end{align}\end{split}\]

Outcome

Suppose \(HT[b] \leftarrow a\)…

\[\begin{split}\begin{align} HT[4] \leftarrow 197354863 \\ HT[5] \leftarrow 132489973 \\ HT[9] \leftarrow 216500306 \\ HT[10] \leftarrow 933185952 \\ HT[12] \leftarrow 134152056 \\ HT[3] \leftarrow 106500306 \\ \end{align}\end{split}\]

Uniform Hashing#

Assumption#: Any key is equally likely (and independent of other keys) to hash to one of \(m\) possible indices
Bins and Balls#: Toss \(n\) balls uniformly at random into \(m\) bins
Bad News [birthday problem]#: In a random group of 23 people, more likely than not that two people share the same birthday Expect two balls in the same bin after \(\sim \sqrt{\pi * \frac{m}{2}} \ \ \ \ \ \ \ \ \ \ // = 23.9\ when\ m = 365\)
Good News#: when \(n \gt\gt m\), expect most bins to have \(\approx \frac{n}{m}\) balls when \(n = m\), expect most loaded bin has \(\sim \frac{ln\ n}{ln\ ln\ n}\) balls

Collisions#

Two distinct keys that hash to the same index birthday problem

\(\Rightarrow\) can’t avoid collisions

load balancing

\(\Rightarrow\) no index gets too many collisions
\(\Rightarrow\) ok to scan though all colliding keys

https://www.log2base2.com/images/algo/hash-collision.png — Fig. 68 collision#

Separate Chaining#

Simple Uniform Hashing

keeps a list of all elements that hash to the same value

Performance

\(m\) = Number of slots in hash table
\(n\) = Number of keys to be inserted in hash table

Load factor \(α = n/m\)
Expected time to search or delete = \(O(1 + α)\)

Time to insert = \(O(1)\)
Time complexity of search, insert, and delete is \(O(1)\ if\ α\ is\ O(1)\)

Example

Example

\[h : {0,81,64,25,36,49,1,4,16,9}\]

\(h(k_1) = 0\ \ \ \ \%\ 10 = 0\ \ \ \ \ \ \ \Rightarrow\ \ \ \ \ HT[0] \leftarrow 0\)
\(h(k_2) = 81\ \ \%\ 10 = 1\ \ \ \ \ \ \ \Rightarrow\ \ \ \ \ HT[1] \leftarrow 81\)
\(h(k_3) = 64\ \ \%\ 10 = 4\ \ \ \ \ \ \ \Rightarrow\ \ \ \ \ HT[4] \leftarrow 64\)
\(h(k_4) = 25\ \ \%\ 10 = 5\ \ \ \ \ \ \ \Rightarrow\ \ \ \ \ HT[5] \leftarrow 25\)
\(h(k_5) = 36\ \ \%\ 10 = 6\ \ \ \ \ \ \ \Rightarrow\ \ \ \ \ HT[6] \leftarrow 36\)
\(h(k_6) = 49\ \ \%\ 10 = 9\ \ \ \ \ \ \ \Rightarrow\ \ \ \ \ HT[9] \leftarrow 49\)
\(h(k_7) = 1\ \ \ \ \%\ 10 = 1\ \ \ \ \ \ \ \Rightarrow\ \ \ \ \ HT[1] \leftarrow 1\)
\(h(k_8) = 4\ \ \ \ \%\ 10 = 4\ \ \ \ \ \ \ \Rightarrow\ \ \ \ \ HT[4] \leftarrow 4\)
\(h(k_9) = 16\ \ \%\ 10 = 6\ \ \ \ \ \ \ \Rightarrow\ \ \ \ \ HT[6] \leftarrow 16\)
\(h(k_{10}) = 9\ \ \ \%\ 10 = 9\ \ \ \ \ \ \ \Rightarrow\ \ \ \ \ HT[9] \leftarrow 9\)

../../_images/18_06.png — Fig. 69 A separate chaining hash table#

Open Addressing#

Linear Probing

keeps a list of all elements that hash to the same value

Rule

\(h_i(x) = (Hash(x) + i) \ \% \ HashTableSize\)

If \(h_0(x) = (Hash(x) + 0) \ \% \ HashTableSize\)
If \(h_1(x) = (Hash(x) + 1) \ \% \ HashTableSize\)
If \(h_2(x) = (Hash(x) + 2) \ \% \ HashTableSize\)
… and so on

https://media.geeksforgeeks.org/wp-content/cdn-uploads/gq/2015/08/openAddressing1.png

Example

\[h : \{50, 700, 76, 85, 92, 73, 101\}\]

\[\begin{split}\begin{align} h_0(50) &= 50\ \ \ \ \% \ 7 = 1 \\ \\ h_0(700) & = 700\ \ \% \ 7 = 0 \\ \\ h_0(76) &= 76\ \ \ \% \ 7 \ = 6 \\ \\ \end{align}\end{split}\]

\[\begin{split}\begin{align} h_0(85) &= 85\ \ \ \% \ 7 \ = \color{red}{1} \\ & \Rightarrow h_1(85) = (85+1)\ \ \%\ 7 = 2 \\ h_0(92) &= 92\ \ \%\ 7 = \color{red}{1} \\ & \Rightarrow h_1(92) = (92+1)\ \ \%\ 7 = \color{red}{2} \\ & \Rightarrow h_2(92) = (92+2)\ \ \%\ 7 = 3 \\ \\ h_0(73) &= 73\ \ \%\ 7 = \color{red}{3} \\ & \Rightarrow h_1(73) = (73+1)\ \ \%\ 7 = 4 \\ \\ h_0(101) &= 101\ \ \%\ 7 = \color{red}{3} \\ & \Rightarrow h_1(101) = (101+1)\ \ \%\ 7 = \color{red}{4} \\ & \Rightarrow h_2(101) = (101+2)\ \ \%\ 7 = 5 \\ \end{align}\end{split}\]

Quadratic Probing#

Rule

\[\begin{split}\begin{align} h_i(x) & = (Hash(x) + i^2)\ \%\ \ HashTableSize \\ & \Rightarrow (Hash(x) + i*i)\ \%\ \ HashTableSize \\ \\ If\ h_0(x) & = (Hash(x) + 0^0)\ \%\ \ HashTableSize \\ If\ h_1(x) & = (Hash(x) + 1^1)\ \%\ \ HashTableSize \\ If\ h_2(x) & = (Hash(x) + 2^2)\ \%\ \ HashTableSize \\ & ... and\ so\ on\ if\ h_i\ is\ already\ full... \end{align}\end{split}\]

Example

\[h : \{50, 700, 76, 85, 92, 73, 101\}\]

\[\begin{split}\begin{align} h_0(50) &= 50\ \ \ \ \% \ 7 = 1 \\ h_0(700) & = 700\ \ \%\ 7 = 0 \\ h_0(76) &= 76\ \ \%\ 7 = 6 \\ h_0(85) &= 85\ \ \%\ 7 = 1 \\ & \Rightarrow h_1(85) = 85+(1*1)\ \ \%\ 7 = 2 \\ h_0(92) & f= 92\ \ \%\ 7 = 1 \\ & \Rightarrow h_1(92) = 92+(1*1)\ \ \%\ 7 = 2 \\ & \Rightarrow h_2(92) = 92+(2*2)\ \ \%\ 7 = 5 \\ h_0(73) &= 73\ \ \%\ 7 = 3 \\ h_0(101) &= 101\ \ \%\ 7 = 3 \\ & \Rightarrow h_1(101) = 101+(1*1)\ \ \%\ 7 = 4 \\ \end{align}\end{split}\]

Double Hashing#

Rule

\[\begin{split}\begin{align} H_{a}(x) &= Hash1(x)\ \%\ \ HashTableSize \\ H_{b}(x) &= Hash2(x)\ \%\ \ HashTableSize \\ \\ h(k, i) &= \bigg[h_{a}(k) + i * h_{b}(k) \bigg] \ \% \ n \\ \end{align}\end{split}\]

Example

\[h : \{50, 700, 76, 85, 92, 73, 101\}\]

\[size : 7\]

50, 700, 76

\[\begin{split}\begin{align} h_0(50) &= 50\ \ \ \ \% \ 7 = 1 \\ \\ h_0(700) & = 700\ \ \%\ 7 = 0 \\ \\ h_0(76) &= 76\ \ \%\ 7 = 6 \\ \\ \end{align}\end{split}\]

\(Arr[0]\)	\(700\)
\(Arr[1]\)	\(50\)
\(Arr[2]\)
\(Arr[3]\)
\(Arr[4]\)
\(Arr[5]\)
\(Arr[6]\)	\(76\)

85

\[\begin{split}\begin{align} h_0(85) &= 85\ \ \%\ 7 = 1 \\ h_1(85) &= \bigg[ h_{a}(85) + i * h_{b}(85) \bigg] \ \% \ 7 \\ &= \bigg[1 + 1 * (85\ \% \ 7) \bigg] \ \% \ 7 \\ &= \bigg[1 + 1 * 1 \bigg] \ \% \ 7 \\ &= 2 \\ \end{align}\end{split}\]

\(Arr[0]\)	\(700\)
\(Arr[1]\)	\(50\)
\(Arr[2]\)	\(85\)
\(Arr[3]\)
\(Arr[4]\)
\(Arr[5]\)
\(Arr[6]\)	\(76\)

92

\[\begin{split}\begin{align} h_0(92) &= 92\ \ \%\ 7 = 1 \\ h_1(92) &= \bigg[ h_{a}(92) + i * h_{b}(92) \bigg] \ \% \ 7 \\ &= \bigg[1 + 1 * (92\ \% \ 7) \bigg] \ \% \ 7 \\ &= \bigg[1 + 1 * 1 \bigg] \ \% \ 7 \\ &= 2 \\ h_2(92) &= \bigg[ h_{a}(92) + i * h_{b}(92) \bigg] \ \% \ 7 \\ &= \bigg[1 + 2 * (92\ \% \ 7) \bigg] \ \% \ 7 \\ &= \bigg[1 + 2 * 1 \bigg] \ \% \ 7 \\ &= 3 \\ \end{align}\end{split}\]

\(Arr[0]\)	\(700\)
\(Arr[1]\)	\(50\)
\(Arr[2]\)	\(85\)
\(Arr[3]\)	\(92\)
\(Arr[4]\)
\(Arr[5]\)
\(Arr[6]\)	\(76\)

73

\[\begin{split}\begin{align} h_0(73) &= 73\ \ \% \ 7 = 3 \\ h_1(73) &= \bigg[ h_{a}(73) + i * h_{b}(73) \bigg] \ \% \ 7 \\ &= \bigg[3 + 1 * (73\ \% \ 7) \bigg] \ \% \ 7 \\ &= \bigg[3 + 1 * 3 \bigg] \ \% \ 7 \\ &= 6 \\ h_2(73) &= \bigg[ h_{a}(73) + i * h_{b}(73) \bigg] \ \% \ 7 \\ &= \bigg[3 + 2 * (73\ \% \ 7) \bigg] \ \% \ 7 \\ &= \bigg[3 + 2 * 3 \bigg] \ \% \ 7 \\ &= 2 \\ h_3(73) &= \bigg[ h_{a}(73) + i * h_{b}(73) \bigg] \ \% \ 7 \\ &= \bigg[3 + 3 * (73\ \% \ 7) \bigg] \ \% \ 7 \\ &= \bigg[3 + 3 * 3 \bigg] \ \% \ 7 \\ &= 5 \\ \end{align}\end{split}\]

\(Arr[0]\)	\(700\)
\(Arr[1]\)	\(50\)
\(Arr[2]\)	\(85\)
\(Arr[3]\)	\(92\)
\(Arr[4]\)
\(Arr[5]\)	\(73\)
\(Arr[6]\)	\(76\)

101

\[\begin{split}\begin{align} h_0(101) &= 101\ \ \%\ 7 = 3 \\ h_1(101) &= \bigg[ h_{a}(101) + i * h_{b}(101) \bigg] \ \% \ 7 \\ &= \bigg[3 + 1 * (101\ \% \ 7) \bigg] \ \% \ 7 \\ &= \bigg[3 + 1 * 3 \bigg] \ \% \ 7 \\ &= 6 \\ h_2(101) &= \bigg[ h_{a}(101) + i * h_{b}(101) \bigg] \ \% \ 7 \\ &= \bigg[3 + 2 * (101\ \% \ 7) \bigg] \ \% \ 7 \\ &= \bigg[3 + 2 * 3 \bigg] \ \% \ 7 \\ &= 2 \\ h_3(101) &= \bigg[ h_{a}(101) + i * h_{b}(101) \bigg] \ \% \ 7 \\ &= \bigg[3 + 3 * (101\ \% \ 7) \bigg] \ \% \ 7 \\ &= \bigg[3 + 3 * 3 \bigg] \ \% \ 7 \\ &= 5 \\ h_4(101) &= \bigg[ h_{a}(101) + i * h_{b}(101) \bigg] \ \% \ 7 \\ &= \bigg[3 + 4 * (101\ \% \ 7) \bigg] \ \% \ 7 \\ &= \bigg[3 + 4 * 3 \bigg] \ \% \ 7 \\ &= 1 \\ h_5(101) &= \bigg[ h_{a}(101) + i * h_{b}(101) \bigg] \ \% \ 7 \\ &= \bigg[3 + 5 * (101\ \% \ 7) \bigg] \ \% \ 7 \\ &= \bigg[3 + 5 * 3 \bigg] \ \% \ 7 \\ &= 4 \\ \end{align}\end{split}\]

\(Arr[0]\)	\(700\)
\(Arr[1]\)	\(50\)
\(Arr[2]\)	\(85\)
\(Arr[3]\)	\(92\)
\(Arr[4]\)	\(101\)
\(Arr[5]\)	\(73\)
\(Arr[6]\)	\(76\)

Comparison#

Linear Probing

Easy to implement
Best cache performance
Suffers from clustering

Quadratic Probing

Average cache performance
Suffers less from clustering

Double Hashing

Poor cache performance
No clustering
Requires more computation time

Hash Tables

Contents

Hash Tables#

Storing data#

Hash Tables#

Why not…#

Hash Functions#

Modular Hashing#

Uniform Hashing#

Collisions#

Separate Chaining#

Open Addressing#

Quadratic Probing#

Double Hashing#

Comparison#

Hash Tables #

Hash Functions #

Modular Hashing #