A heap is the data structure for “give me the next-best thing repeatedly, while new candidates keep arriving.” Priority queue is the abstract interface; binary heap is the usual concrete implementation.
When to reach for it
- Top-k. Maintain a min-heap of size
k; on each new element, push and pop if size exceedsk.O(n log k)overall, beats sorting whenk ≪ n. - Streaming median. Two heaps — a max-heap of the lower half, a min-heap of the upper half. Rebalance after each insert. Median is the root(s).
O(log n)insert,O(1)query. - Merging k sorted streams. Min-heap of
(value, stream_id); pop and pull from the corresponding stream.O(n log k). - Dijkstra / Prim / A*. Frontier as a min-heap keyed by distance / cost.
- Scheduled job execution. Heap keyed by
next_fire_time; pop, run, reschedule. - Event simulation. Same as scheduling — events keyed by simulated time.
Operations and costs
| Op | Cost |
|---|---|
peek (root) | O(1) |
push | O(log n) |
pop | O(log n) |
heapify (build from list) | O(n) — bottom-up sift-down beats n pushes |
kth largest | O(n + k log n) via heap, or O(n log k) via size-k heap |
heapify being O(n), not O(n log n), is the non-obvious one — and the reason you should heapify once, not push n times to build.
Python specifics
heapq is min-heap only. For max-heap, negate keys. For tuples like (priority, item), ensure items are comparable or wrap with a counter to prevent comparison on item:
heapq.heappush(h, (priority, counter, item))heapq.nlargest(k, iterable, key=...) is O(n log k) and idiomatic — reach for it before rolling your own.
What heaps don’t do
- No efficient arbitrary deletion or update. “Decrease-key” is
O(n)to find, thenO(log n)to fix. The standard workaround in Dijkstra is lazy deletion: push the new entry, ignore stale entries on pop (if visited[node]: continue). Cleaner than a real Fibonacci heap for almost any practical scale. - No order beyond the root. The internal layout is partial order, not sorted. Don’t iterate a heap expecting sorted output without popping.
- Not a search structure. “Is X in the heap” is
O(n). If you need that, you also need a side index.