为什么 Python 允许序列的片索引超出范围?

所以我只是偶然发现了一个在我看来很奇怪的 Python 特性,并希望对它进行一些澄清。

下面的数组操作有些道理:

p = [1,2,3]
p[3:] = [4]
p = [1,2,3,4]

我想它实际上只是将这个值附加到结尾,对吗?
但是,为什么我可以这样做呢?

p[20:22] = [5,6]
p = [1,2,3,4,5,6]

更有甚者:

p[20:100] = [7,8]
p = [1,2,3,4,5,6,7,8]

这似乎是错误的逻辑。似乎这应该抛出一个错误!

有什么解释吗?
这只是巨蟒做的一件奇怪的事吗?
有什么目的吗?
还是我想错了?

4453 次浏览

The documentation has your answer:

s[i:j]: slice of s from i to j (note (4))

(4) The slice of s from i to j is defined as the sequence of items with index k such that i <= k < j. If ABC2 or j is greater than len(s), use len(s). If i is omitted or None, use 0. If j is omitted or None, use len(s). If i is greater than or equal to j, the slice is empty.

The documentation of IndexError confirms this behavior:

exception IndexError

Raised when a sequence subscript is out of range. (Slice indices are silently truncated to fall in the allowed range; if an index is not an integer, TypeError is raised.)

Essentially, stuff like p[20:100] is being reduced to p[len(p):len(p]. p[len(p):len(p] is an empty slice at the end of the list, and assigning a list to it will modify the end of the list to contain said list. Thus, it works like appending/extending the original list.

This behavior is the same as what happens when you assign a list to an empty slice anywhere in the original list. For example:

In [1]: p = [1, 2, 3, 4]


In [2]: p[2:2] = [42, 42, 42]


In [3]: p
Out[3]: [1, 2, 42, 42, 42, 3, 4]

Part of question regarding out-of-range indices

Slice logic automatically clips the indices to the length of the sequence.

Allowing slice indices to extend past end points was done for convenience. It would be a pain to have to range check every expression and then adjust the limits manually, so Python does it for you.

Consider the use case of wanting to display no more than the first 50 characters of a text message.

The easy way (what Python does now):

preview = msg[:50]

Or the hard way (do the limit checks yourself):

n = len(msg)
preview = msg[:50] if n > 50 else msg

Manually implementing that logic for adjustment of end points would be easy to forget, would be easy to get wrong (updating the 50 in two places), would be wordy, and would be slow. Python moves that logic to its internals where it is succint, automatic, fast, and correct. This is one of the reasons I love Python :-)

Part of question regarding assignments length mismatch from input length

The OP also wanted to know the rationale for allowing assignments such as p[20:100] = [7,8] where the assignment target has a different length (80) than the replacement data length (2).

It's easiest to see the motivation by an analogy with strings. Consider, "five little monkeys".replace("little", "humongous"). Note that the target "little" has only six letters and "humongous" has nine. We can do the same with lists:

>>> s = list("five little monkeys")
>>> i = s.index('l')
>>> n = len('little')
>>> s[i : i+n ] = list("humongous")
>>> ''.join(s)
'five humongous monkeys'

This all comes down to convenience.

Prior to the introduction of the copy() and clear() methods, these used to be popular idioms:

s[:] = []           # clear a list
t = u[:]            # copy a list

Even now, we use this to update lists when filtering:

s[:] = [x for x in s if not math.isnan(x)]   # filter-out NaN values

Hope these practical examples give a good perspective on why slicing works as it does.