software2

Binary search gets a sort key

2022-02-15, Comments

Suppose you have an list of distinct elements which has been sorted
and rotated. How would you look up an element within that list?

For example, the list:

[7, 11, 13, 19, 2, 3, 5]

is sorted (the first 7 primes, in order) and rotated (to put 7 first).

With this list as input, then:

  • look up 13 returns 2 since 13 is at index 2
  • look up 2 returns 4
  • look up 4 returns the sentinel value -1

The obvious technique is to just search the list:

def lookup(values, v): try: return values.index(v) except IndexError: return -1

This is a linear algorithm which processes the entire list. Is there a way
to take advantage of its sorted+rotated-ness?

If the list was sorted then we could apply a binary search for a
logarithmic lookup. And in fact, by applying a custom ordering, the
list is sorted.

How can we apply a custom ordering in Python?

The way to do this has changed as Python has developed. The table below shows
the evolution of standard Python functions which sort and compare.

Version Year Functions Notes
2.0 2000 max, min, list.sort([cmp]), bisect.bisect Note that [cmp] slows the sorting process down considerably
2.3 2003 heapq Also known as the priority queue algorithm
2.4 2004 sorted(iterable[, cmp[, key[, reverse]]]), list.sort([cmp[, key[, reverse]]]), itertools.groupby(iterable[, key) In general, the key and reverse conversion processes are much faster than specifying an equivalent cmp function
2.4 2004 heapq.nlargest, heapq.nsmallest Heapq extended
2.5 2006 max([key]), min([key]), heapq.nlargest(…[, key]), heapq.smallest(…[, key])
2.6 2008 heapq.merge Heapq extended again
3.0 2008 sorted(iterable, key[, reverse]), list.sort(key[, reverse]) No more cmp
3.5 2015 heapq.merge(…, key)
3.10 2021 bisect.bisect(…, key) etc That leaves the low-level heapq functions

The earliest versions of Python allowed you to sort lists, and
the sort was customised using a cmp function — though the documentation
warned there would be a performance penalty[*]. The builtin min and max
functions could not be cutomised, and nor could the comparison used in
the bisect module — which is Python’s binary search implementation.

At 2.3 the heapq module appeared, but, like bisect, there was no way
to customise the ordering of heap elements.

2.4 introduced the key argument to customise ordering,
noting this should be preferred to cmp. Unlike cmp which compares two
elements, key takes a single element and returns a sort rank for that element.
The key argument was added alongside cmp in list.sort and the new sorted
builtin function; and key was the only way to customise the grouping in
itertools.groupby.

2.5 added key to min and max, and also to heapq.nlargest, heapq.nsmallest.
Although these heapq functions now accept a key, the lower level heap functions
to heapify, push, pop and replace do not.

2.6 introduced heapq.merge, a handy function to merge sorted inputs
using a heap, but with no option to specify a sort key.

3.0 got rid of cmp, making key the only way to customise
sorting. To migrate from Python 2 to 3 any cmp functions need
converting to key functions. As with Python 2.5, at 3.0 you could
apply a key to list.sort, sorted, min, max,
itertools.groupby, heapq.nlargest, heapq.nsmallest — but not to
bisect or other heapq functions.

3.5 added key to heapq.merge, aligning it with heapq.nlargest
and heapq.nsmallest, though it remained impossible to use a sort key
with the lower level heap functions.

The next change came in 3.10, when the key parameter was
added to the bisect module. As far as I can tell that means the only part of
standard Python which doesn’t let you fully customise ordering is the heapq module.

Bisect with a search key

So, to return to the original puzzle, consider our example
sorted+rotated list:

>>> values = [7, 11, 13, 19, 2, 3, 5]

The first four elements are greater than or equal to the first
element, 7. The last three elements are less than 7.

>>> [v < values[0] for v in values]
[False, False, False, False, True, True, True]

Since False < True the result of the list comprehension above is sorted.
Extending this idea, the following comprehension is also sorted.

>>> [(v < values[0], v) for v in values]
[(False, 7), (False, 11), (False, 13), (False, 19), (True, 2), (True, 3), (True, 5)]

Now we have a logarithmic lookup using binary search:

from bisect import bisect_left def lookup(values, v): key = lambda x: (x < values[0], x) i = bisect_left(values, key(v), key=key) return i if (i < len(values) and values[i] == v) else -1

Don’t search for v, search for its key

I was surprised to realise the value v being looked up has to have the key
function applied before calling bisect_left. That is, to find where v
should go in values to maintain the sort order, we pass key(v) to bisect_left.
This doesn’t match the interface to binary search in other languages. It also means
in our example we have to handle empty lists as a special case.

def lookup(values, v): if not values: return -1 key = lambda x: (x < values[0], x) i = bisect_left(values, key(v), key=key) return i if (i < len(values) and values[i] == v) else -1

[*] Before the introduction of the sort key, the standard pattern was
decorate-sort-undecorate.

Software almacen de Cea Ordenadores

Comentarios desactivados en Binary search gets a sort key