I get the idea of the slicing operator in Python, but I am kind of confused at "stop".
For instance:
lst = [1,2,3,4,5]
print(lst[0:4])
I think the answer should be [1,2,3,4,5]
, since it will stop on index 4, which is element 5.
However, the correct answer will be [1,2,3,4]
.
What is the explanation?
Python ranges are said to be "exclusive", because the element with index stop
is excluded from the results.
This behavior was chosen because exclusive (rather than inclusive) upper bounds work nicely with 0-indexed sequences, which is what Python uses in all of its core data sequence types (list
, tuple
, str
, bytes
, bytearray
, and array.array
).
It's one thing to memorize this, but it's another to understand why it makes sense!
In general, 0-indexing of arrays invites you to treat any position in the array as a cursor located between elements of the array. This design for arrays seems a little unusual at first, but it's actually a very ergonomic way to do things.
Consider the array:
values: a b c d e f
indexes: 0 1 2 3 4 5
We are taught that we count from 0, so the first position is index 0, the second position is index 1, and so on up to the final position, which is index len(data) - 1
.
We can visualize this pattern as describing the position of a cursor located before the element of interest:
index 0: | a b c d e f
index 1: a | b c d e f
index 2: a b | c d e f
index 3: a b c | d e f
index 4: a b c d | e f
index 5: a b c d e | f
Numbering the cursor positions as 1 less than the element position is a deliberate reflection of this mental model.
There are historical reasons to have designed arrays this way, relating to memory addresses and pointers. But unless you are programming in C or C++, then you can mostly ignore those reasons, because it also works nicely as an abstract model of sequences. You can see here for a famous aesthetic argument in favor of 0-indexing.
Once we accept 0-indexing as comfortable and natural, this in turn affects how we think about ranges, because now the index of the final element in the array is index len(data) - 1
. So if we want to select the entire array using an inclusive upper bound, we would have to write our range as 0 : len(data) - 1
. That's ugly and clunky, so using exclusive upper bounds allows us to write 0 : len(data)
instead. Now the upper bound of the range nicely coincides with the length of the data, while allowing us to use 0-indexing. This behavior appears in the range
function, :
syntax, and the slice
class that :
syntax represents.
Furthermore, the exclusive upper bound reinforces the idea that a 0-based index is a cursor located before the current element.
Consider a range that selects c
, d
, and e
. If you draw a box around the selected values, you'll notice that the rightmost/uppermost "edge" of the "box" is actually past the final value:
values: a b | c d e | f
indexes: 0 1 | 2 3 4 | 5
|-----------|
If we think in terms of cursors, the upper bound is located between e
and f
, which we know is index 5. We do not go past the upper bound; we stop before it. Therefore it's completely natural to write this range as 2:5
-- we start before index 2
, including indices 3
and 4
, and then stop before we reach index 5
. If we wrote the range as 2:4
, that would lead to an inconsistent interpretation of indices as cursors, which would be confusing.
Finally, user chepner pointed out in a comment that exclusive ranges lead to an elegant property of slices of sequences. For sequence x
and indices a < b < c
, then x[a:b] + x[b:c] == x[a:c]
. This property is not preserved when ranges are inclusive. As an exercise, use the "cursor" model to convince yourself that this property is preserved when ranges are exclusive and broken when ranges are inclusive.