25. Priority queues
- Each consist of a key and an associated value (like a dictionary)
- Used to prioritize entries
- Define a total order on the keys (e.g. alphabetical order)
- May identify or remove the entry whose key is the lowest (but no other entry)
For concreteness, let's use Integer objects as our keys. The main operations:
insert()
adds an entry to the priority queue;min()
returns the entry with the minimum keyremoveMin()
both removes and returns the entry with the minimum key.
key
--------- | --------- ---------
|4: womp| v |4: womp| | |
|7: gong|-insert(5, hoot)->|7: gong|-removeMin()->|7: gong|->min()
| | ^ |5: hoot| | |5: hoot| |
--------- | --------- v --------- v
value (4, womp) (5, hoot)
public interface PriorityQueue {
public int size();
public boolean isEmpty();
Entry insert(Object key, Object value);
Entry min();
Entry removeMin();
}
See page 340 of Goodrich & Tamassia for how they implement an "Entry".
Binary Heaps: An Implementation of Priority Queues
Outline
- complete binary tree
- binary heap
- array-based tree data structure
- implementation of
min
,insert
andremoveMin
A complete binary tree is a binary tree in which every row is full, except possibly the bottom row, which is filled from left to right as in the illustration below.
2 index: 0 1 2 3 4 5 6 7 8 9 10
/ \
/ \ ------------------------------------------------
5 3 | | 2 | 5 | 3 | 9 | 6 | 11 | 4 | 17 | 10 | 8 |
/ \ / \ ------------------------------------------------
9 6 11 4 ^
/ \ / |
17 10 8 \--- array index 0 intentionally left empty.
A binary heap is a complete binary tree whose entries satisfy the heap-order property: no child has a key less than its parent's key (all children's key are greater than their parents). Observe that every subtree of a binary heap is also a binary heap, because every subtree is complete and satisfies the heap-order property.
Array-based tree data structure
A binary heap can be stored compactly as an array of entries. We map tree nodes to array indices with level_numbering, which places the root at index 1 and orders the remaining nodes by a level-order traversal of the tree.
Observe that if a node's index is i
, its children's indices are 2i
and 2i+1
, and its parent's index is floor(i/2)
. (it's why starting from 0)
Array-based tree data structure is faster than node-and-reference-based tree data structure because there is no need to read and write the references that connect nodes to each other, cache performance is better, and finding the last node in the level order is easier..
IMPLEMENTATION
[1] Entry min();
The heap-order property ensures that the entry with the minimum key is always at the top of the heap. Hence, we simply return the entry at the root node. If the heap is empty, return null or throw an exception.
[2] Entry insert(Object k, Object v);
- Place the new entry x at the first free spot from the left
Correct the violation until the heap-order property is satisfied
while x.key < x.parent.key: exchange x with its parent
Prove that when finished, the heap-order property is satisfied.
Optimize: bubble a hole up the tree, then fill in x, instead of putting x at the bottom of the tree and bubble it up. (It saves the time that would be spent setting a sequence of references to x that are going to change anyway.)
[3] Entry removeMin();
If the heap is empty, return null or throw an exception. Otherwise, begin by removing the entry at the root node and saving it for the return value.
Fill the hole with the last entry x in the tree and bubble x down the heap until the heap-order property is satisfied.
Optimize: bubble a hole down the tree, then fill in x, instead of putting x at the root and bubble it down.
Running Times
Binary Heap Sorted List/Array Unsorted List/Array
min() Theta(1) Theta(1) Theta(n)
insert()
worst-case Theta(log n) * Theta(n) Theta(1) *
best-case Theta(1) * Theta(1) * Theta(1) *
removeMin()
worst-case Theta(log n) Theta(1) ** Theta(n)
best-case Theta(1) Theta(1) ** Theta(n)
* If you're using an array-based data structure, these running times assume
that you don't run out of room. If you do, it will take Theta(n) time to
allocate a larger array and copy the entries into it. However, if you
double the array size each time, the _average_ running time will still be
as indicated.
** Removing the minimum from a sorted array in constant time is most easily
done by keeping the array always sorted from largest to smallest.
In a binary heap, min's running time is clearly in Theta(1).
insert() puts an entry x at the bottom of the tree and bubbles it up. At each level of the tree, it takes O(1) time to compare x with its parent and swap if indicated. An n-node complete binary tree has height floor(log2 n). In the worst case, x will bubble all the way to the top, taking Theta(log n) time.
Similarly, removeMin() may cause an entry to bubble all the way down the heap, taking Theta(log n) worst-case time.
Bottom-Up Heap Construction
Suppose we are given a bunch of randomly ordered entries, and want to make a heap out of them. We could insert them one by one in O(n log n) time, but there's a faster way. We define one more heap operation.
[4] void bottomUpHeap();
First, we make a complete tree out of the entries, in any order. (If we're using an array representation, we just throw all the entries into an array.) Then we work backward from the last internal node (non-leaf node) to the root node, in reverse order in the array or the level-order traversal. When we visit a node this way, we bubble its entry down the heap as in removeMin().
Before we bubble an entry down, we know (inductively) that its two child subtrees are heaps. Hence, by bubbling the entry down, we create a larger heap rooted at the node where that entry started.
+-+
9 9 9 |2|
/ \ / \ / \ /-\
/ \ / \-+ +-/ \ / \
4 7 => 4 |2| => |2| 2 => 4 2
/ \ / \ / \ /-\ /-\ / \ / \ / \
2 8 2 6 2 8 7 6 4 8 7 6 9 8 7 6
The running time of bottomUpHeap is tricky to derive. If each internal node bubbles all the way down, then the running time is proportional to the sum of the heights of all the nodes in the tree. Page 371 of Goodrich and Tamassia has a simple and elegant argument showing that this sum is less than n, where n is the number of entries being coalesced into a heap. Hence, the running time is in Theta(n), which beats inserting n entries into a heap individually.
Postscript: Other Types of Heaps
Binary heaps are not the only heaps in town. Several important variants are called "mergeable heaps", because it is relatively fast to combine two mergeable heaps together into a single mergeable heap. We will not describe these complicated heaps in CS 61B, but it's worthwhile for you to know they exist in case you ever need one.
The best-known mergeable heaps are called "binomial heaps," "Fibonacci heaps," "skew heaps," and "pairing heaps." Fibonacci heaps have another remarkable property: if you have a reference to an arbitrary node in a Fibonacci heap, you can decrease its key in constant time. (Pairing heaps are suspected of having the same property, but nobody knows for sure.) This operation is used frequently by Dijkstra's algorithm, an important algorithm for finding the shortest path in a graph. The following running times are all worst-case.
Binary Binomial Skew Pairing Fibonacci
insert() O(log n) O(log n) O(1) O(log n) * O(1)
removeMin() O(log n) O(log n) O(log n) O(log n) O(log n)
merge() O(n) O(log n) O(1) O(log n) * O(1)
decreaseKey() O(log n) O(log n) O(log n) O(log n) * O(1)
- Conjectured to be O(1), but nobody has proven or disproven it.
The time bounds given here for skew heaps, pairing heaps, and Fibonacci heaps are "amortized" bounds, not worst case bounds. This means that, if you start from an empty heap, any sequence of operations will take no more than the given time bound on average, although individual operations may occasionally take longer. We'll discuss amortized analysis near the end of the semester.