为什么元组比列表占用更少的内存空间？

小开

最佳答案

我假设您使用的是64位的 CPython (我在 CPython 2.764位上得到了相同的结果)。在其他 Python 实现中可能存在差异，或者如果您使用的是32位 Python。

不管实现如何，list是可变大小的，而 tuple是固定大小的。

So tuples can store the elements directly inside the struct, lists on the other hand need a layer of indirection (it stores a pointer to the elements). This layer of indirection is a pointer, on 64bit systems that's 64bit, hence 8bytes.

但是 list做的另一件事是: 他们过度分配。否则，list.append将是一个 O(n)操作 一直都是-使其摊销 O(1)(更快! ! !)分配过多。但是现在它必须跟踪分配和 填满了的大小(tuple只需要存储一个大小，因为分配和填充的大小总是相同的)。这意味着每个列表必须存储另一个“大小”，在64位系统上是一个64位整数，也是8字节。

因此，list比 tuple至少需要多16字节的内存。我为什么说“至少”？因为分配过多。过度分配意味着它分配的空间超过需要。但是，过度分配的数量取决于“如何”创建列表和追加/删除历史记录:

>>> l = [1,2,3]
>>> l.__sizeof__()
64
>>> l.append(4)  # triggers re-allocation (with over-allocation), because the original list is full
>>> l.__sizeof__()
96


>>> l = []
>>> l.__sizeof__()
40
>>> l.append(1)  # re-allocation with over-allocation
>>> l.__sizeof__()
72
>>> l.append(2)  # no re-alloc
>>> l.append(3)  # no re-alloc
>>> l.__sizeof__()
72
>>> l.append(4)  # still has room, so no over-allocation needed (yet)
>>> l.__sizeof__()
72

图像

我决定创建一些图像来配合上面的解释。也许这些是有帮助的

下面是示例中它(示意图)在内存中的存储方式:

这实际上只是一个近似值，因为 int对象也是 Python 对象，而 CPython 甚至重用小整数，所以内存中对象的更准确表示(尽管不那么可读)可能是:

有用连结:

请注意，__sizeof__并没有真正返回“正确”的大小！它只返回存储值的大小。然而，当你使用 sys.getsizeof时，结果是不同的:

>>> import sys
>>> l = [1,2,3]
>>> t = (1, 2, 3)
>>> sys.getsizeof(l)
88
>>> sys.getsizeof(t)
72

There are 24 "extra" bytes. These are 真的, that's the garbage collector overhead that isn't accounted for in the __sizeof__ method. That's because you're generally not supposed to use magic methods directly - use the functions that know how to handle them, in this case: sys.getsizeof (which actually 增加了 GC 开销 to the value returned from __sizeof__).

小开

MSeifert 的回答涵盖面很广; 为了简单起见，你可以想到:

tuple 是不可变的。一旦定型，就无法改变了。因此，您可以事先知道需要为该对象分配多少内存。

list 是可变的。可以向其中添加或从中移除项。它必须知道它现在的大小。它可以根据需要调整大小。

There are no free meals - these capabilities comes with a cost. Hence the overhead in memory for lists.

小开

我将更深入地研究 CPython 代码库，这样我们就可以看到实际上是如何计算大小的。在你的具体例子中，no over-allocations have been performed, so I won't touch on that.

我将在这里使用64位值，就像您一样。

list的大小是根据以下函数 list_sizeof计算出来的:

static PyObject *
list_sizeof(PyListObject *self)
{
Py_ssize_t res;


res = _PyObject_SIZE(Py_TYPE(self)) + self->allocated * sizeof(void*);
return PyInt_FromSsize_t(res);
}

这里的 Py_TYPE(self)是一个抓取 self的 ob_type(返回 PyList_Type)的宏，而 _PyObject_SIZE是另一个从该类型抓取 ob_type0的宏。tp_basicsize计算为 sizeof(PyListObject)，其中 PyListObject是实例结构。

PyListObject结构有三个字段:

PyObject_VAR_HEAD     # 24 bytes
PyObject **ob_item;   #  8 bytes
Py_ssize_t allocated; #  8 bytes

这些都有解释它们是什么的评论(我删减了) ，请点击上面的链接阅读它们。PyObject_VAR_HEAD扩展成三个8字节字段(ob_refcount、 ob_type和 ob_size) ，因此 24字节的贡献。

因此，目前 res是:

sizeof(PyListObject) + self->allocated * sizeof(void*)

or:

40 + self->allocated * sizeof(void*)

如果列表实例具有已分配的元素，则为。第二部分计算他们的贡献。顾名思义，self->allocated保存分配的元素数。

在没有任何元素的情况下，列表的大小计算为:

>>> [].__sizeof__()
40

即实例结构的大小。

tuple对象不定义 tuple_sizeof函数，而是使用 object_sizeof来计算它们的大小:

static PyObject *
object_sizeof(PyObject *self, PyObject *args)
{
Py_ssize_t res, isize;


res = 0;
isize = self->ob_type->tp_itemsize;
if (isize > 0)
res = Py_SIZE(self) * isize;
res += self->ob_type->tp_basicsize;


return PyInt_FromSsize_t(res);
}

对于 list来说，它抓取 tp_basicsize，如果对象有一个非零的 tp_itemsize(意味着它有一个可变长度的实例) ，它将元组中的项数乘以 tp_itemsize(它通过 Py_SIZE获得)。

tp_basicsize再次使用 sizeof(PyTupleObject)，其中结构包含:

PyObject_VAR_HEAD       # 24 bytes
PyObject *ob_item[1];   # 8  bytes

因此，如果没有任何元素(即 Py_SIZE返回 0) ，空元组的大小就等于 sizeof(PyTupleObject):

>>> ().__sizeof__()
24

嗯？好吧，这里有一个奇怪的现象，我还没有找到一个解释，tuple的 tp_basicsize实际上计算如下:

sizeof(PyTupleObject) - sizeof(PyObject *)

why an additional 8 bytes is removed from tp_basicsize is something I haven't been able to find out. (See MSeifert's comment for a possible explanation)

但是，这基本上就是 在你的具体例子中的不同之处。list还保留了许多已分配的元素，这有助于确定何时再次过度分配。

现在，当添加额外的元素时，列表确实会执行这种过度分配，以实现 O (1)附加。这导致了更大的尺寸，因为 MSeifert 的回答很好地涵盖了这一点。

小开

元组的大小是有前缀的，也就是说，在元组初始化时，解释器为所包含的数据分配足够的空间，因此它是不可变的(不能被修改)。尽管列表是一个可变对象，因此意味着动态分配内存，所以为了避免在每次追加或修改列表时分配空间(分配足够的空间来包含已更改的数据并将数据复制到其中) ，它为将来的运行时更改分配额外的空间，比如追加和修改。

差不多就是这样。