Suppose you have an list of distinct elements which has been sorted
and rotated. How would you look up an element within that list?

For example, the list:

``````[7, 11, 13, 19, 2, 3, 5]
``````

is sorted (the first 7 primes, in order) and rotated (to put 7 first).

With this list as input, then:

• look up `13` returns `2` since `13` is at index `2`
• look up `2` returns `4`
• look up `4` returns the sentinel value `-1`

The obvious technique is to just search the list:

``````def lookup(values, v): try: return values.index(v) except IndexError: return -1
``````

This is a linear algorithm which processes the entire list. Is there a way
to take advantage of its sorted+rotated-ness?

If the list was sorted then we could apply a binary search for a
logarithmic lookup. And in fact, by applying a custom ordering, the
list is sorted.

How can we apply a custom ordering in Python?

The way to do this has changed as Python has developed. The table below shows
the evolution of standard Python functions which sort and compare.

Version Year Functions Notes
2.0 2000 max, min, list.sort([cmp]), bisect.bisect Note that [cmp] slows the sorting process down considerably
2.3 2003 heapq Also known as the priority queue algorithm
2.4 2004 sorted(iterable[, cmp[, key[, reverse]]]), list.sort([cmp[, key[, reverse]]]), itertools.groupby(iterable[, key) In general, the key and reverse conversion processes are much faster than specifying an equivalent cmp function
2.4 2004 heapq.nlargest, heapq.nsmallest Heapq extended
2.5 2006 max([key]), min([key]), heapq.nlargest(…[, key]), heapq.smallest(…[, key])
2.6 2008 heapq.merge Heapq extended again
3.0 2008 sorted(iterable, key[, reverse]), list.sort(key[, reverse]) No more `cmp`
3.5 2015 heapq.merge(…, key)
3.10 2021 bisect.bisect(…, key) etc That leaves the low-level `heapq` functions

The earliest versions of Python allowed you to sort lists, and
the sort was customised using a `cmp` function — though the documentation
warned there would be a performance penalty[*]. The builtin `min` and `max`
functions could not be cutomised, and nor could the comparison used in
the `bisect` module — which is Python’s binary search implementation.

At 2.3 the `heapq` module appeared, but, like `bisect`, there was no way
to customise the ordering of heap elements.

2.4 introduced the `key` argument to customise ordering,
noting this should be preferred to `cmp`. Unlike `cmp` which compares two
elements, `key` takes a single element and returns a sort rank for that element.
The `key` argument was added alongside `cmp` in `list.sort` and the new `sorted`
builtin function; and `key` was the only way to customise the grouping in
`itertools.groupby`.

2.5 added `key` to `min` and `max`, and also to `heapq.nlargest`, `heapq.nsmallest`.
Although these `heapq` functions now accept a `key`, the lower level heap functions
to heapify, push, pop and replace do not.

2.6 introduced `heapq.merge`, a handy function to merge sorted inputs
using a heap, but with no option to specify a sort key.

3.0 got rid of `cmp`, making `key` the only way to customise
sorting. To migrate from Python 2 to 3 any `cmp` functions need
converting to `key` functions. As with Python 2.5, at 3.0 you could
apply a key to `list.sort`, `sorted`, `min`, `max`,
`itertools.groupby`, `heapq.nlargest`, `heapq.nsmallest` — but not to
`bisect` or other `heapq` functions.

3.5 added `key` to `heapq.merge`, aligning it with `heapq.nlargest`
and `heapq.nsmallest`, though it remained impossible to use a sort key
with the lower level heap functions.

The next change came in 3.10, when the `key` parameter was
added to the `bisect` module. As far as I can tell that means the only part of
standard Python which doesn’t let you fully customise ordering is the `heapq` module.

## Bisect with a search key

sorted+rotated list:

``````>>> values = [7, 11, 13, 19, 2, 3, 5]
``````

The first four elements are greater than or equal to the first
element, 7. The last three elements are less than 7.

``````>>> [v < values[0] for v in values]
[False, False, False, False, True, True, True]
``````

Since `False` < `True` the result of the list comprehension above is sorted.
Extending this idea, the following comprehension is also sorted.

``````>>> [(v < values[0], v) for v in values]
[(False, 7), (False, 11), (False, 13), (False, 19), (True, 2), (True, 3), (True, 5)]
``````

Now we have a logarithmic lookup using binary search:

``````from bisect import bisect_left def lookup(values, v): key = lambda x: (x < values[0], x) i = bisect_left(values, key(v), key=key) return i if (i < len(values) and values[i] == v) else -1
``````

## Don’t search for v, search for its key

I was surprised to realise the value `v` being looked up has to have the `key`
function applied before calling `bisect_left`. That is, to find where `v`
should go in `values` to maintain the sort order, we pass `key(v)` to `bisect_left`.
This doesn’t match the interface to binary search in other languages. It also means
in our example we have to handle empty lists as a special case.

``````def lookup(values, v): if not values: return -1 key = lambda x: (x < values[0], x) i = bisect_left(values, key(v), key=key) return i if (i < len(values) and values[i] == v) else -1
``````

[*] Before the introduction of the sort `key`, the standard pattern was
decorate-sort-undecorate.

Comentarios desactivados en Binary search gets a sort key