software2

Tripped up by Rstrip

2021-05-14Comments

Here’s a simple function which is supposed to strip any ".jpg" extension
from a file name.

def photo_name(photo_file): "Return the name of the photo" return photo_file.rstrip('.jpg') assert photo_name('cat.jpg') == 'cat'
assert photo_name('selfie.jpg') == 'selfie'
assert photo_name('emoji.gif') == 'emoji.gif' # A GIF is not a photo 

As shown, it passes a few simple tests.

Unfortunately it turns out to be broken.

>>> photo_name('dog.jpg') 'do'
>>> photo_name('tile.png') 'tile.pn' 

There’s no mystery here. A check of the documentation shows
that the optional chars parameter to str.rstrip specifies a set of
trailing characters to be removed from the source string.

So, in the example above, '.jpg' means: strip trailing
characters in the set {'.', 'j', 'p', 'g'}. In the case of
'dog.jpg' that includes the final 'g' of 'dog'. Similarly
the final 'g' of 'tile.png' gets stripped.

That said, it’s a common misunderstanding, and one which has been made
for the eighteen years since the optional chars argument got added
to rstrip. At Python 3.9 the documentation takes care to
explain:

The chars argument is a string specifying the set of
characters to be removed. If omitted or None, the chars argument
defaults to removing whitespace. The chars argument is not a suffix;
rather, all combinations of its values are stripped.

and points out that the new removesuffix method might be what you’re really after.

Out of interest I tracked the documentation back to Python 2.3, when the
function description was less clear.

If given and not None, chars must be a string; the
characters in the string will be stripped from the end of the string
this method is called on. Changed in version 2.2.2: Support for the
chars argument.

Luckily misuse of rstrip to remove extensions will usually get
spotted soon enough, even if — as shown — it evades can cursory inspection and testing.

It’s worth reviewing why the confusion persists.

  1. str.rstrip() to remove trailing whitespace is a common thing to want to do
  2. removing a suffix is also a common thing to want to do
  3. removing a set of trailing chars from a string is less common (except the special case of stripping whitespace)
  4. the chars parameter to str.rstrip() is not a set, it is an ordered sequence
  5. s.rstrip('.jpg') (for example) will remove any '.jpg' suffix from s, so it sort-of works

Software almacen de Cea Ordenadores

Comentarios desactivados en Tripped up by Rstrip