Mergesort

Mergesort#

Divide & Conquer #

https://codeyz.com/wp-content/uploads/2020/07/image-1-792x512.png

Divide the problem into smaller subproblems

Conquer recursively

… each subproblem

Combine Solutions

Example

sorting with insertion sort is \(n^2\)
we can divide the array into two halves and sort them separately

each subproblem could be sorted in \(≈\frac{n^2}{4}\)
sorting both halves will require \(≈2\frac{n^2}{4}\) 🤔
we need an additional operation to combine both solutions

Time “reduced” from \(≈n^2\) to \(≈\frac{n^2}{2}+n\)

Merge Sort #

Divide the array into two halves

just need to calculate the midpoint

Conquer Recursively each half

call Merge Sort on each half (i.e. solve 2 smaller problems)

Merge Solutions

after both calls are finished, proceed to merge the solutions

pseudocode

MergeSort(arr[], lo,  hi)

if (hi <= lo) return;

  // Find the middle point to divide the array into two halves:
  int mid = lo + (hi - lo) / 2;

  // Call mergeSort for first half:
  mergesort(A, lo, mid);

  // Call mergeSort for second half:
  mergesort(A, mid + 1, hi);

  // Merge the two halves sorted in steps 2 and 3:
  merge(A, lo, mid, hi);

Merge Sort: pseudocode

mergesort

void mergesort(int *A, int n) {
  int *aux = new int[n];  
  r_mergesort(A, aux, 0, n - 1);
  delete[] aux;
}

r_mergesort

void r_mergesort(int *A, int *aux, int lo,int hi) {
  
    //basecase(single element or empty list)
  if (hi <= lo) return;

  //divide
  int mid = lo + (hi - lo) / 2;

  //recursively sort halves  
  r_mergesort(A, aux, lo, mid);
  r_mergesort(A, aux, mid + 1, hi);

  //merge results 
  merge(A, aux, lo, mid, hi);
}

merge

void merge (int *A, int *aux, int lo, int mid,int hi) {

    // copy array
    std::memcpy(aux + lo, A + lo, (hi - lo + 1 * sizeof(A)));
    
    // merge 
    int i = lo, j = mid + 1;

    for (int k = lo; k <= hi; k++) {  

        if (i > mid) A[k]=aux[j++];  

        else if (j > hi) A[k] = aux[i++];

        else if(aux[j] < aux[i]) A[k] = aux[j++];

        else A[k] = aux[i++];
    }
}

https://algs4.cs.princeton.edu/22mergesort/images/mergesortTD-bars.png

Divide & Conquer #

Consider how this breaks out…

Unsorted

https://www.kirupa.com/sorts/images/mergesort_numbers_300.png — Fig. 96 Collection Unsorted#

Divide

https://www.kirupa.com/sorts/images/mergesort_1st_div_300.png — Fig. 97 First Division#

https://www.kirupa.com/sorts/images/mergesort_2nd_div_300.png — Fig. 98 Second Division#

https://www.kirupa.com/sorts/images/mergesort_3rd_div_300.png — Fig. 99 Third Division#

Conquer Setup

https://www.kirupa.com/sorts/images/merge_setup_300.png

First Row

https://www.kirupa.com/sorts/images/merge_step_one_300.png — Fig. 100 Step 1#

https://www.kirupa.com/sorts/images/merge_step_two_300.png — Fig. 101 Step 2#

https://www.kirupa.com/sorts/images/merge_step_first_row_300.png — Fig. 102 Completed First Row#

Second Row

https://www.kirupa.com/sorts/images/merge_step_2nd_row_3_300.png — Fig. 103 Step 3#

Third Row

https://www.kirupa.com/sorts/images/merge_step_3rd_row_1_300.png — Fig. 104 Step 1#

https://www.kirupa.com/sorts/images/merge_step_3rd_row_2.png — Fig. 105 Step 2#

Sorted

Try It!

Add the collection values to Your Values
- Collection values : 5, 12, 4, 1, 2, 8, 2, 6, 10
Change the List Size to 9
Press run and click through the steps…

Merging two sorted arrays #

https://www.baeldung.com/wp-content/uploads/2019/12/Merge-Sorted-Arrays.png

A secondary array is necessary

to guarantee a lineartime operation

Visualize…

Analysis (recurrence)#

	Worst Case	Average Case	Best Case
Time Complexity	\(\Theta(n\ log\ n)\)	\(\Theta(n\ log\ n)\)	\(\Theta(n\ log\ n)\)

Breakdown (generalization)

A merge sort consists of several passes over the input.

\(1^{st}\) Pass: merges segments of size 1
\(2^{nd}\) Pass: merges segments of size 2
\(i^{th}\) Pass: merges segments of size \(2^{i-1}\).
Total number of passes: \(log\ n\)

As merge showed, we can merge two sorted segments in linear time, which means that each pass takes \(O(n)\) time. Since there are \(log\ n\) passes, the total computing time is \(\Theta(n\ log\ n)\), or expressed as:

\[T(n) = 2T(\frac{n}{2}) + \Theta(n)\]

Recursion Tree (trace)#

https://www.kirupa.com/sorts/images/running_complexity_300.png

Traversal Order

https://media.geeksforgeeks.org/wp-content/uploads/20220722205737/MergeSortTutorial.png

Try It!

Total time for \(mergeSort\) is: the sum of the merging times for all the levels

If there are \(l\) levels in the tree, then the total merging time is \(l * cn\)
Where \(l\) is the number of subproblem levels until subproblems reach size \(n\)
For total time, we end up with:

\[cn(log\ n + 1)\]

Discard the low-order term (constant) and the coefficient

\[\Theta(n\ log\ n)\]

Sorting Visualizer #

Comments on Merge Sort#

Advantages

relatively efficient for sorting large datasets
stable sort : the order of elements with equal values is preserved during the sort
easy to implement
useful for external sorting
requires relatively few additional resources (such as memory)

Disadvantages

slower compared to the other sort algorithms for smaller tasks
requires an additional memory space of 0(n) for the temporary array
regardless of sort status, goes through whole process
requires more code to implement

Example#

Think about reversing an array or string

In-place sorting #

An algorithm does not use extra space for manipulating the input but may require a small though non-constant extra space for its operation. Usually, this space is \(O(log\ n)\), though sometimes anything in \(O(n)\) (Smaller than linear) is allowed.

– Geek for Geeks

selection sort

void selectionSort(int arr[], int n)
{
    int i, j, min_idx;
 
    // One by one move boundary of unsorted subarray
    for (i = 0; i < n-1; i++) {
       
        // Find the minimum element in unsorted 
        // array
        min_idx = i;
        for (j = i+1; j < n; j++)
            
            if (arr[j] < arr[min_idx]) min_idx = j;
 
        // Swap the found minimum element with 
        // the first element
        if(min_idx!=i)
            swap(&arr[min_idx], &arr[i]);
    }
}

insertion sort

void insertionSort(int arr[], int n) 
{ 
    int i, key, j; 
    
    for (i = 1; i < n; i++) 
    { 
        key = arr[i]; 
        j = i - 1; 
  
        // Move elements of arr[0..i-1], 
        // greater than key, to one position 
        // ahead of their current position
        while (j >= 0 && arr[j] > key)
        { 
            arr[j + 1] = arr[j]; 
            j = j - 1; 
        } 
        arr[j + 1] = key; 
    } 
} 

Stable Sorting#

Stability

A sorting algorithm is stable if it preserves the order of equal elements

Consider sorting (in ascending order) a list \(A\) into a sorted list \(B\). Let \(f(i)\) be the index of element \(A[i]\) in \(B\). The sorting algorithm is stable if:

for any pair \((i,j)\) such that \(A[i] = A[j]\) and \(i < j\), then \(f(i) < f(j)\)

Stable?

unsorted

sorted

Stable?

How about now?

unsorted

sorted

Stability#

long distance swaps
try: 5 2 3 8 4 5 6

equal items never pass each other (depends on correct implementation)

Sorting Algorithms#

incomplete

	Best-Case	Average-Case	Worst-Case	Stable	In-place
Selection Sort
Insertion Sort
Merge Sort

complete

	Best-Case	Average-Case	Worst-Case	Stable	In-place
Selection Sort	\(\Theta(n^2)\)	\(\Theta(n^2)\)	\(\Theta(n^2)\)	No	Yes
Insertion Sort	\(\Theta(n)\)	\(\Theta(n^2)\)	\(\Theta(n^2)\)	Yes	Yes
Merge Sort	\(\Theta(n\ log\ n)\)	\(\Theta(n\ log\ n)\)	\(\Theta(n\ log\ n)\)	Yes	No

Test Yourself…#

Mergesort Merging Proficiency Exercise

Mergesort Proficiency Exercise