Sets#

https://icrontic.com/uploads/features/tech/2011/09/nothing_to_see_here.jpg

Define#

A set is a fundamental data structure in computer science that stores a collection of unique elements. It ensures that no duplicates are allowed, and it doesn’t impose a specific order on the elements.

An unordered_set is a fundamental data structure in computer science that represents a collection of unique elements, similar to a mathematical set. It is implemented as a hash table, providing fast access and ensuring uniqueness of its elements. Unlike a std::set, it does not maintain any specific order of the elements.

Use Cases#

Graph Algorithms {both}
https://raw.githubusercontent.com/kdn251/interviews/master/images/dijkstra.gif

Sets can be used to track visited nodes in graph traversal algorithms.

Database Indexing {set}
https://cdn-media-1.freecodecamp.org/images/0eg06hWYJWhXPt1QNuaDlETYrmnSKAo6Nf44

Sets are used to maintain unique values in database indexes, ensuring fast lookups.

Data Deduplication {unordered_set}
http://3.bp.blogspot.com/-47SCyzU4tMM/UwK23slYgJI/AAAAAAAAUS4/8XZ52p1D044/s1600/deduplication3.gif

Removing duplicates from a list of records, such as emails or customer IDs.

Membership Testing {both}
https://www.researchgate.net/profile/Vassilios-Vassilakis/publication/318440316/figure/fig3/AS:608741581934592@1522146711030/Illustrating-the-false-positives-during-a-membership-test.png

Sets are efficient for checking whether an element is part of a specific group or category.

Spell Checking {set}
https://helpcenter.onlyoffice.com/OfficeWeb/apps/documenteditor/main/resources/help/en/images/spellchecking.png

In word processing applications, a set can be used to maintain a dictionary of correctly spelled words.

Counting Occurrences {unordered_set}
https://www.w3resource.com/w3r_images/cpp-array-image-exercise-20.png

Counting the frequency of unique elements in a dataset.

Advantages & Disadvantages#

Advantages
Uniqueness {both}

Sets enforce uniqueness, ensuring no duplicate elements.

Fast Lookup {both}

Efficient for searching and checking if an element exists.

Simple Interface {set}

Typically provides simple and intuitive methods like insert, contains, and remove.

Flexible Data Storage {unordered_set}

Suitable for scenarios where element order is not important.

Disadvantages
No Ordering {both}

Elements are not stored in a specific order, which may be a disadvantage in some use cases.

Overhead {set}

May require more memory and have some overhead for maintaining uniqueness.

Slower Insertions {set}

Inserting elements can be slower compared to data structures optimized for insertion.

Hash Collisions {unordered_set}

In rare cases, hash collisions can lead to performance degradation.

Programming#

Set Data Structure:
  - Initialize an empty set
  - Implement functions for insert, delete, search, and traverse

We use the std::unordered_set container from the C++ Standard Library, which is a hash table-based implementation of a set.
We insert, check for existence, and remove elements using the insert, find, and erase methods.
Finally, we display the elements in the set.

 1#include <iostream>
 2#include <set>
 3
 4int main() {
 5    std::set<int> mySet;
 6    
 7    // Insert elements
 8    mySet.insert(10);
 9    mySet.insert(5);
10    mySet.insert(20);
11
12    // Search for an element
13    auto it = mySet.find(5);
14    if (it != mySet.end()) {
15        std::cout << "Element 5 found in the set.\n";
16    }
17
18    // Delete an element
19    mySet.erase(10);
20
21    // Traverse the set
22    for (const int& element : mySet) {
23        std::cout << element << " ";
24    }
25    std::cout << "\n";
26
27    return 0;
28}
Element 5 found in the set.
Set elements: 5 20
Unordered Set Data Structure:

Data:
- Initialize an array (buckets) of a fixed size for storing elements.
- Each bucket is a linked list to handle collisions.

Functions:
- Insert(value):
    1. Calculate the hash of the value.
    2. Find the bucket using the hash.
    3. Search the bucket for the value; if not found, append the value to the bucket.

- Contains(value):
    1. Calculate the hash of the value.
    2. Find the bucket using the hash.
    3. Search the bucket for the value; return true if found, false otherwise.

- Remove(value):
    1. Calculate the hash of the value.
    2. Find the bucket using the hash.
    3. Search the bucket for the value, and if found, remove it.

- Display():
    1. Iterate through each bucket and display the elements.


We use the std::unordered_set container from the C++ Standard Library, which is a hash table-based implementation of a set.
We insert, check for existence, and remove elements using the insert, find, and erase methods.
Finally, we display the elements in the set.

 1#include <iostream>
 2#include <unordered_set>
 3
 4int main() {
 5    std::unordered_set<int> mySet;
 6
 7    // Insert elements
 8    mySet.insert(10);
 9    mySet.insert(5);
10    mySet.insert(20);
11
12    // Check if an element exists
13    if (mySet.find(5) != mySet.end()) {
14        std::cout << "Element 5 found in the unordered set.\n";
15    }
16
17    // Remove an element
18    mySet.erase(10);
19
20    // Display the elements
21    std::cout << "Unordered set elements: ";
22    for (const int& element : mySet) {
23        std::cout << element << " ";
24    }
25    std::cout << "\n";
26
27    return 0;
28}
Element 5 found in the unordered set.
Unordered set elements: 5 20

Compare#

std::set

std::unordered_set

Data Structure

Balanced Binary Search Tree

Hash Table

Order of Elements

Sorted, elements in ascending order

No specific order

find/contains

\(O(log\ n)\)

\(O(1)\) average, \(O(n)\) worst-case

insert

\(O(log\ n)\)

\(O(1)\) average, \(O(n)\) worst-case

remove

\(O(log\ n)\)

\(O(1)\) average, \(O(n)\) worst-case

Element Order

Preserved

No specific order

Memory Usage

Relatively lower

Relatively higher due to hash table

Custom Key Types

Requires operator< for keys

Requires a hash function

Range Iteration

Efficient

Less efficient

Use Cases

When elements need to be sorted

When fast access times are critical, order doesn’t matter

Extra Notes

Well-suited for maintaining sorted collections.

Suitable for fast access with no order requirement.
Collision handling may degrade performance in rare cases.

Note: assume a well-designed/distributed hash function and minimal collisions. In practice, worst-case scenarios should also be considered, leading to amortized \(O(1)\) performance for many operations