A Guide to Choosing The Best Sorting Algorithm

DevTales

2020-02-04

Algorithms, Programming, Sorting Algorithms

Sorting is an integral part of Computer Science. From maintaining a simple telephone directory to data compression, a lot of problems become easier when you simply sort the input data (finding the median, for example, or applying Binary Search). Naturally, it makes sense then to choose the most efficient sorting algorithm. This is what this blog is dedicated to.

Before I dive into it, let me make it clear that there is no ideal sorting algorithm for every single case. The best algorithm to use varies from case to case. It is important, however, to know which algorithm to use when. We will be using the following list of factors (not exhaustive) to filter out our algorithms.

Time Complexity
Space requirements
Worst-case behaviour
Caching
Recursive calls (Stack space)
Assumption(s) about input data
Stability

Let’s make it easier and look at the 3 most common O(n lg n) algorithms.

Merge Sort

-- recursively merge sort the left and right sub-arrays and then merge them together in linear time.
msort :: (Ord a) => [a] -> [a]
msort [x] = [x]
msort xs = merge (msort left) (msort right)
           where (left, right) = halve xs

merge :: (Ord a) => [a] -> [a] -> [a]
merge xs [] = xs
merge [] ys = ys
merge (x:xs) (y:ys) | x < y     = x:merge xs (y:ys)
                    | otherwise = y:merge (x:xs) ys

High-level Thoughts

Has O(n lg n) worst-case time
O(n) space - worse when the result needs to copied into the original array
No recursive calls
Poorer cache performance (poor use of spatial locality) compared to Quicksort
Stable

Quicksort

-- The empty list is already sorted
-- Non-empty lists can be sorted by sorting the tail values less than or equal to the head, sorting the tail values greater than the head, and then appending the resulting lists on either side of the head value.
-- Haskell has probably the simplest implementation of quicksort compared to any programming language!
quickSort :: [Int] -> [Int]
quickSort [] = []
quickSort (x:xs) = quickSort[a | a <- xs, a <= x] ++ [x] ++ quickSort[b | b <- xs, b > x]

High-level Thoughts

Average O(n lg n) time - Quicksort is fast because it uses spatial locality — it walks neighboring elements, comparing them to the pivot value (which can be stored in a register). It makes very effective use of caching
Stack frames from recursive calls result in O(lg n) space
Worst-case O(n²) time - the further away the median from the pivot, the worse the performance
Unstable- input data needs to be sorted
Personally, I feel the Quicksort algorithm is complicated, and you have to pass left and right boundary variables😕

Heapsort

from typing import List

"""
Build Max Heap takes Θ(n) time
Then, Max heapify is called n times so T(n) = T(n-1) + Θ(lg  n) = Θ(n lg n)
Best and worst-case: Θ(n lg n) + Θ(n) = Θ(n lg n)
Space complexity: Θ(1) [In-place sort]
"""

def heapSort(nums: List[int]) -> None:
    buildMaxHeap(nums, len(nums))
    n = len(nums)

    for i in range(n - 1, 0, -1):
        nums[i], nums[0] = nums[0], nums[i]
        maxHeapify(nums, 0, i)


def buildMaxHeap(nums: List[int], n: int) -> None:
    for i in range(n // 2, -1, -1):
        maxHeapify(nums, i, n)


def maxHeapify(nums: List[int], i: int, n: int) -> None:
    largest = i
    left = 2 * i
    right = left + 1
    if left < n and nums[left] > nums[i]:
        largest = left
    if right < n and nums[right] > nums[largest]:
        largest = right
    if not largest == i:
        nums[i], nums[largest] = nums[largest], nums[i]
        maxHeapify(nums, largest, n)

High-level Thoughts

Creating the heap is O(n). Popping items is O(1), and fixing the heap after the pop is logarithmic. There are n pops, so there is an O(n lg n) factor, which is O(n lg n) overall
Can be sorted in-place so constant space - which is great when space is short, e.g. embedded systems, mobile devices, etc.
Rarely used on modern systems because it has poor cache performance - array entries are seldom compared to nearby entries
Still much slower than Quick Sort on average. You still need to perform a swap on every element in the array, even if it’s already in the right place
Unstable- input data needs to be sorted

Insertion Sort

Okay, let’s consider Insertion Sort as well. Believe it or not, it actually performs really well on small lists. Also, I will feel bad if I don’t include it - it was the first sorting algorithm I ever learned about.

-- the sorted array is built considering one item at a time.
-- the array elements are compared with each other sequentially and then arranged simultaneously in some particular order
insert :: (Ord a) => a -> [a] -> [a]
insert x [] = [x]
insert x (y:ys) = if x <= y
                  then x:y:ys
                    else y : insert x ys

isort :: [Int] -> [Int]
isort [] = []
isort (x:xs) = insert x (isort xs)

High-level Thoughts

Okay let's address the elephant in the room - O(n²) worst-case time
However, sorts in linear time for nearly sorted arrays
Constant space - sorts in-place
It is efficient for smaller data sets, but very inefficient for larger lists
It performs better than Selection Sort and Bubble Sort algorithms
Stable sort - does not change the relative order of elements which are equal
Like bubble sort, I have always found this counter-intuitive because you step “backwards”

Summary

As you can see, each of the sorting algorithms have their own advantages and disadvantages. Choose the sort that is best suited for your data. If you need something stable, Merge Sort is a good candidate. If you’re constrained in space, go for Heapsort. For nearly sorted data, consider that insertion sort is linear time!

Modern sorting algorithms use Hybrid sorts that combine the best qualities of different sorting algorithms. Perhaps the most widely used is introsort (std::sort in C++) which begins with Quicksort and switches to Heapsort when the recursion depth exceeds a level based on the number of elements being sorted.