数组类型与用 malloc 分配的数组之间的差异

今天我在帮我的一个朋友写 C 代码,我发现了一些奇怪的行为,我无法向他解释为什么会发生这样的事情。我们有一个包含整数列表的 TSV 文件,每一行都有一个 int。第一行是列表的行数。

我们还有一个带有非常简单的“ readfile”的 c 文件。第一行被读取到 n,即行数,然后初始化为:

int list[n]

最后是 n的 for 循环和 fscanf

对于小 n (直到 ~ 100.000) ,一切都很好。然而,我们发现当 n 很大(10 ^ 6)时,会出现一个 Segfault。

最后,我们将列表初始化更改为

int *list = malloc(n*sizeof(int))

当一切顺利时,即使是非常大的 n

有人能解释一下为什么会这样吗?是什么导致了 int list[n]的 Segfault 在我们开始使用 list = malloc(n*sizeof(int))的时候停止了?

45447 次浏览

int list[n] stores the data in the stack, while malloc stores it in the heap.

The stack is limited, and there is not much space, while the heap is much much bigger.

int list[n] is a VLA, which allocates on the stack instead of on the heap. You don't have to free it (it frees automatically at the end of the function call) and it allocates quickly but the storage space is very limited, as you have discovered. You must allocate larger values on the heap.

This declaration allocates memory on the stack

    int list[n]

malloc allocates on the heap.

Stack size is usually smaller than heap, so if you allocate too much memory on the stack you get a stackoverflow.

See also this answer for further information

When you allocate using a malloc, memory is allocated from heap and not from stack, which is much more limited in size.

int list[n]

Allocates space for n integers on the stack, which is usually pretty small. Using memory on the stack is much faster than the alternative, but it is quite small and it is easy to overflow the stack (i.e. allocate too much memory) if you do things like allocate huge arrays or do recursion too deeply. You do not have to manually deallocate memory allocated this way, it is done by the compiler when the array goes out of scope.

malloc on the other hand allocates space in the heap, which is usually very large compared to the stack. You will have to allocate a much larger amount of memory on the heap to exhaust it, but it is a lot slower to allocate memory on the heap than it is on the stack, and you must deallocate it manually via free when you are done using it.

There are several different pieces at play here.

The first is the difference between declaring an array as

int array[n];

and

int* array = malloc(n * sizeof(int));

In the first version, you are declaring an object with automatic storage duration. This means that the array lives only as long as the function that calls it exists. In the second version, you are getting memory with dynamic storage duration, which means that it will exist until it is explicitly deallocated with free.

The reason that the second version works here is an implementation detail of how C is usually compiled. Typically, C memory is split into several regions, including the stack (for function calls and local variables) and the heap (for malloced objects). The stack typically has a much smaller size than the heap; usually it's something like 8MB. As a result, if you try to allocate a huge array with

int array[n];

Then you might exceed the stack's storage space, causing the segfault. On the other hand, the heap usually has a huge size (say, as much space as is free on the system), and so mallocing a large object won't cause an out-of-memory error.

In general, be careful with variable-length arrays in C. They can easily exceed stack size. Prefer malloc unless you know the size is small or that you really only do want the array for a short period of time.

Hope this helps!

Assuming you have a typical implementation in your implementation it's most likely that:

int list[n]

allocated list on your stack, where as:

int *list = malloc(n*sizeof(int))

allocated memory on your heap.

In the case of a stack there is typically a limit to how large these can grow (if they can grow at all). In the case of a heap there is still a limit, but that tends to be much largely and (broadly) constrained by your RAM+swap+address space which is typically at least an order of magnitude larger, if not more.

If you are on linux, you can set ulimit -s to a larger value and this might work for stack allocation also. When you allocate memory on stack, that memory remains till the end of your function's execution. If you allocate memory on heap(using malloc), you can free the memory any time you want(even before the end of your function's execution).

Generally, heap should be used for large memory allocations.

   int array[n];

It is an example of statically allocated array and at the compile time the size of the array will be known. And the array will be allocated on the stack.

   int *array(malloc(sizeof(int)*n);

It is an example of dynamically allocated array and the size of the array will be known to user at the run time. And the array will be allocated on the heap.