Suppose you have an list of distinct elements which has been sorted
and rotated. How would you look up an element within that list?
For example, the list:
[7, 11, 13, 19, 2, 3, 5]
is sorted (the first 7 primes, in order) and rotated (to put 7 first).
With this list as input, then:
- look up
13is at index
- look up
- look up
4returns the sentinel value
The obvious technique is to just search the list:
def lookup(values, v): try: return values.index(v) except IndexError: return -1
This is a linear algorithm which processes the entire list. Is there a way
to take advantage of its sorted+rotated-ness?
If the list was sorted then we could apply a binary search for a
logarithmic lookup. And in fact, by applying a custom ordering, the
list is sorted.
How can we apply a custom ordering in Python?
The way to do this has changed as Python has developed. The table below shows
the evolution of standard Python functions which sort and compare.
|2.0||2000||max, min, list.sort([cmp]), bisect.bisect||Note that [cmp] slows the sorting process down considerably|
|2.3||2003||heapq||Also known as the priority queue algorithm|
|2.4||2004||sorted(iterable[, cmp[, key[, reverse]]]), list.sort([cmp[, key[, reverse]]]), itertools.groupby(iterable[, key)||In general, the key and reverse conversion processes are much faster than specifying an equivalent cmp function|
|2.4||2004||heapq.nlargest, heapq.nsmallest||Heapq extended|
|2.5||2006||max([key]), min([key]), heapq.nlargest(…[, key]), heapq.smallest(…[, key])|
|2.6||2008||heapq.merge||Heapq extended again|
|3.0||2008||sorted(iterable, key[, reverse]), list.sort(key[, reverse])||No more
|3.10||2021||bisect.bisect(…, key) etc||That leaves the low-level
The earliest versions of Python allowed you to sort lists, and
the sort was customised using a
cmp function — though the documentation
warned there would be a performance penalty[*]. The builtin
functions could not be cutomised, and nor could the comparison used in
bisect module — which is Python’s binary search implementation.
At 2.3 the
heapq module appeared, but, like
bisect, there was no way
to customise the ordering of heap elements.
2.4 introduced the
key argument to customise ordering,
noting this should be preferred to
cmp which compares two
key takes a single element and returns a sort rank for that element.
key argument was added alongside
list.sort and the new
builtin function; and
key was the only way to customise the grouping in
max, and also to
heapq functions now accept a
key, the lower level heap functions
to heapify, push, pop and replace do not.
heapq.merge, a handy function to merge sorted inputs
using a heap, but with no option to specify a sort key.
3.0 got rid of
key the only way to customise
sorting. To migrate from Python 2 to 3 any
cmp functions need
key functions. As with Python 2.5, at 3.0 you could
apply a key to
heapq.nsmallest — but not to
bisect or other
heapq.merge, aligning it with
heapq.nsmallest, though it remained impossible to use a sort key
with the lower level heap functions.
The next change came in 3.10, when the
key parameter was
added to the
bisect module. As far as I can tell that means the only part of
standard Python which doesn’t let you fully customise ordering is the
Bisect with a search key
So, to return to the original puzzle, consider our example
>>> values = [7, 11, 13, 19, 2, 3, 5]
The first four elements are greater than or equal to the first
element, 7. The last three elements are less than 7.
>>> [v < values for v in values] [False, False, False, False, True, True, True]
True the result of the list comprehension above is sorted.
Extending this idea, the following comprehension is also sorted.
>>> [(v < values, v) for v in values] [(False, 7), (False, 11), (False, 13), (False, 19), (True, 2), (True, 3), (True, 5)]
Now we have a logarithmic lookup using binary search:
from bisect import bisect_left def lookup(values, v): key = lambda x: (x < values, x) i = bisect_left(values, key(v), key=key) return i if (i < len(values) and values[i] == v) else -1
Don’t search for v, search for its key
I was surprised to realise the value
v being looked up has to have the
function applied before calling
bisect_left. That is, to find where
should go in
values to maintain the sort order, we pass
This doesn’t match the interface to binary search in other languages. It also means
in our example we have to handle empty lists as a special case.
def lookup(values, v): if not values: return -1 key = lambda x: (x < values, x) i = bisect_left(values, key(v), key=key) return i if (i < len(values) and values[i] == v) else -1
[*] Before the introduction of the sort
key, the standard pattern was