Tripped up by Rstrip


Here’s a simple function which is supposed to strip any ".jpg" extension
from a file name.

def photo_name(photo_file): "Return the name of the photo" return photo_file.rstrip('.jpg') assert photo_name('cat.jpg') == 'cat'
assert photo_name('selfie.jpg') == 'selfie'
assert photo_name('emoji.gif') == 'emoji.gif' # A GIF is not a photo 

As shown, it passes a few simple tests.

Unfortunately it turns out to be broken.

>>> photo_name('dog.jpg') 'do'
>>> photo_name('tile.png') '' 

There’s no mystery here. A check of the documentation shows
that the optional chars parameter to str.rstrip specifies a set of
trailing characters to be removed from the source string.

So, in the example above, '.jpg' means: strip trailing
characters in the set {'.', 'j', 'p', 'g'}. In the case of
'dog.jpg' that includes the final 'g' of 'dog'. Similarly
the final 'g' of 'tile.png' gets stripped.

That said, it’s a common misunderstanding, and one which has been made
for the eighteen years since the optional chars argument got added
to rstrip. At Python 3.9 the documentation takes care to

The chars argument is a string specifying the set of
characters to be removed. If omitted or None, the chars argument
defaults to removing whitespace. The chars argument is not a suffix;
rather, all combinations of its values are stripped.

and points out that the new removesuffix method might be what you’re really after.

Out of interest I tracked the documentation back to Python 2.3, when the
function description was less clear.

If given and not None, chars must be a string; the
characters in the string will be stripped from the end of the string
this method is called on. Changed in version 2.2.2: Support for the
chars argument.

Luckily misuse of rstrip to remove extensions will usually get
spotted soon enough, even if — as shown — it evades can cursory inspection and testing.

It’s worth reviewing why the confusion persists.

  1. str.rstrip() to remove trailing whitespace is a common thing to want to do
  2. removing a suffix is also a common thing to want to do
  3. removing a set of trailing chars from a string is less common (except the special case of stripping whitespace)
  4. the chars parameter to str.rstrip() is not a set, it is an ordered sequence
  5. s.rstrip('.jpg') (for example) will remove any '.jpg' suffix from s, so it sort-of works

Software almacen de Cea Ordenadores

Comentarios desactivados en Tripped up by Rstrip