Introduction
Heap Sort is another example of an efficient sorting algorithm. Its main advantage is that it has a great worstcase runtime of O(n*logn) regardless of the input data.
As the name suggests, Heap Sort relies heavily on the heap data structure – a common implementation of a Priority Queue .
Without a doubt, Heap Sort is one of the simplest sorting algorithms to implement and coupled with the fact that it’s a fairly efficient algorithm compared to other simple implementations, it’s a common one to encounter.
Heap Sort
Heap Sort works by “removing” elements from the heap part of the array onebyone and adding them to the sorted part of the array. Before we get further into the explanation and revisit the heap data structure, we should mention a few attributes of Heap Sort itself.
It is an inplace algorithm , meaning that it requires a constant amount of additional memory, i.e. the memory needed doesn’t depend on the size of the initial array itself, other than the memory needed to store that array.
For example, no copies of the original array are necessary, and there is no recursion and recursive call stacks. The simplest implementation of Heap Sort usually uses a second array to store the sorted values. We will be using this approach since it’s a lot more intuitive and easy to follow in code, but it can be implemented completely inplace .
Heap Sort is unstable , meaning that it does not maintain the relative order of elements with equal values. This isn’t an issue with primitive types (like integers and characters…) but it can be a problem when we sort complex types, like objects.
For example, imagine we have a custom class Person
with the age
and name
fields, and several objects of that class in an array, including a person called “Mike” aged 19 and “David”, also aged 19 – appearing in that order.
If we decided to sort that array of people by age, there would be no guarantee that “Mike” would appear before “David” in the sorted array, even though they appeared in that order in the initial array. It can happen, but it’s not guaranteed.
Fun fact: Heap Sort is the sorting algorithm of choice in the Linux Kernel
The Heap Data Structure
Heaps are one of the most popular and heavily used data structures in computer science – not to mention very popular during Software Engineering interviews.
We’ll talk of heaps keeping track of the smallest element (minheap), but they can just as easily be implemented to keep track of the largest element (maxheap).
Simply put, a minheap is a treebased data structure in which every node is smaller that all of its children. Most often a binary tree is used. Heaps have three supported operations – delete_minimum()
, get_minimum()
, and add()
.
You can only delete the first element in the heap, after which it’s “resorted”. Heaps “resort” themselves after an element is added or removed, so that the the smallest element is always in the first position.
Note:This in no way means that heaps are sorted arrays. The fact that every node is smaller than its children isn’t enough to guarantee that the whole heap is in ascending order.
Let’s look at an example of a heap:
As we can see, the above example does fit the description of a heap but is not sorted. We won’t go into details of the heap implementation since that is not the focus of this article. The crucial advantage of the heap data structure we leverage when using it in Heap Sort is that the next smallest element is always the first element in the heap .
Note: Thanks to the way heaps sort elements after an element is removed, the complexity of the next smallest element moving to the first position, while keeping the array a heap still, takes O(logn) time, which is a highly efficient operation.
Implementation
Sorting Arrays
Python provides methods for creating and using heaps so we don’t have to implement them ourselves:

heappush(list, item)
: Adds an element to the heap, and resorts it afterward so that it remains a heap. Can be used on an empty list. 
heappop(list)
: Pops (removes) the first (smallest) element and returns that element. The heap remains a heap after this operation, so we don’t have to callheapify()
. 
heapify(list)
: Turns the given list into a heap. It is worth noting that this method exists even though we won’t be using this since we don’t want to change our original array.
Now that we know this, the implementation for Heap Sort is fairly straightforward:
from heapq import heappop, heappush def heap_sort(array): heap = [] for element in array: heappush(heap, element) ordered = [] # While we have elements left in the heap while heap: ordered.append(heappop(heap)) return ordered array = [13, 21, 15, 5, 26, 4, 17, 18, 24, 2] print(heap_sort(array))
Output:
[2, 4, 5, 13, 15, 17, 18, 21, 24, 26]
As we can see, the heavy lifting is done with the heap data structure, all we have to do is add all the elements we need and remove them one by one. It’s almost like a coin counting machine that sorts the inputted coins by their value and we can take them out afterwards.
Sorting Custom Objects
Things get a little more complicated when using custom classes. Usually, we advise against overriding comparison operators in classes for the purpose of using our sorting algorithms for them, and instead suggest rewriting the algorithm so that it takes alambda function comparator instead.
However, since our implementation relies on the builtin heap methods, we can’t do that here.
Python does provide the following methods:

heapq.nlargest(*n*, *iterable*, *key=None*)
: Returns a list with the n largest elements from the dataset defined byiterable
. 
heapq.nsmallest(*n*, *iterable*, *key=None*)
: Returns a list with the n smallest elements from the dataset defined byiterable
.
Which we could use to simply get n = len(array)
largest/smallest elements but the methods themselves do not use Heap Sort and are essentially equivalent to just calling the sorted()
method.
The only solution we have left for custom classes is to actually override the comparison operators. This sadly limits us to only one type of comparison per class. In our example it limits us to sorting Movie
objects by year.
However, it does let us demonstrate using Heap Sort on custom classes. Let’s go ahead and define the Movie
class:
from heapq import heappop, heappush class Movie: def __init__(self, title, year): self.title = title self.year = year def __str__(self): return str.format("Title: {}, Year: {}", self.title, self.year) def __lt__(self, other): return self.year < other.year def __gt__(self, other): return other.__lt__(self) def __eq__(self, other): return self.year == other.year def __ne__(self, other): return not self.__eq__(other)
And now, let’s slightly modify our heap_sort()
function:
def heap_sort(array): heap = [] for element in array: heappush(heap, element) ordered = [] while heap: ordered.append(heappop(heap)) return ordered
And finally, let’s instantiate a few movies, put them in an array, and then sort them:
movie1 = Movie("Citizen Kane", 1941) movie2 = Movie("Back to the Future", 1985) movie3 = Movie("Forrest Gump", 1994) movie4 = Movie("The Silence of the Lambs", 1991); movie5 = Movie("Gia", 1998) array = [movie1, movie2, movie3, movie4, movie5] for movie in heap_sort(array): print(movie)
Output:
Title: Citizen Kane, Year: 1941 Title: Back to the Future, Year: 1985 Title: The Silence of the Lambs, Year: 1991 Title: Forrest Gump, Year: 1994 Title: Gia, Year: 1998
Comparison to Other Sorting Algorithms
One of the main reasons Heap Sort is still used fairly often, even though it’s often outperformed by a wellimplementedQuick Sort, is its reliability.
Heap Sort’s main advantage here are the O(n*logn) upper bound as far as time complexity is concerned, and security concerns. Linux kernel developers give the following reasoning to using Heap Sort over Quick Sort:
Sorting time of Heap Sort is O(n*logn) both on average and worstcase. While qsort is about 20% faster on average, it suffers from an exploitable O(n*n) worstcase behavior and extra memory requirements that make it less suitable for kernel use.
Furthermore, Quick Sort behaves poorly in predictable situations, and given enough knowledge of the internal implementation, it could create a security risk (mainly DDoS attacks) since the bad O(n ^{2} ) behavior could easily be triggered.
Another algorithm that Heap Sort is often compared to isMerge Sort, which has the same time complexity.
Merge Sort has the advantage of being stable and intuitively parallelizable , while Heap Sort is neither.
Another note is that Heap Sort is slower than Merge Sort in most cases, even though they have the same complexity, since Heap Sort has larger constant factors.
Heap Sort can, however, be implemented much more easily inplace than Merge Sort can, so it’s preferred when memory is a more important factor than speed.
Conclusion
As we saw, Heap Sort isn’t as popular as other efficient, generalpurpose algorithms but its predictable behavior (other than being unstable) make it a great algorithm to use where memory and security are more important than slightly faster runtime.
It’s really intuitive to implement and leveraging the builtin functionality provided with Python, all we essentially have to do is put the items in a heap and take them out – similar to a coin counter.
CoLaBug.com遵循[CC BYSA 4.0]分享并保持客观立场，本站不承担此类作品侵权行为的直接责任及连带责任。您有版权、意见、投诉等问题，请通过[eMail]联系我们处理，如需商业授权请联系原作者/原网站。