大对象堆碎片

C #/.我正在开发的 NET 应用程序正在遭受缓慢的内存泄漏。我已经使用 CDB 和 SOS 来尝试确定正在发生的事情,但是数据似乎没有任何意义,所以我希望你们中的一个可能以前经历过这个。

应用程序在64位框架上运行。它不断地计算数据并将数据序列化到远程主机,并且对大型对象堆(LargeObjectHeap,LOH)有一定的影响。但是,我希望大多数 LOH 对象是短暂的: 一旦计算完成并发送到远程主机,就应该释放内存。然而,我看到的是大量的(活动的)对象数组与空闲内存块交织在一起,例如,从 LOH 中随机抽取一段:

0:000> !DumpHeap 000000005b5b1000  000000006351da10
Address               MT     Size
...
000000005d4f92e0 0000064280c7c970 16147872
000000005e45f880 00000000001661d0  1901752 Free
000000005e62fd38 00000642788d8ba8     1056       <--
000000005e630158 00000000001661d0  5988848 Free
000000005ebe6348 00000642788d8ba8     1056
000000005ebe6768 00000000001661d0  6481336 Free
000000005f214d20 00000642788d8ba8     1056
000000005f215140 00000000001661d0  7346016 Free
000000005f9168a0 00000642788d8ba8     1056
000000005f916cc0 00000000001661d0  7611648 Free
00000000600591c0 00000642788d8ba8     1056
00000000600595e0 00000000001661d0   264808 Free
...

显然,如果我的应用程序在每次计算期间都要创建长期存在的大型对象,那么就会出现这种情况。(它确实这样做了,我承认会有一定程度的 LOH 碎片化,但这不是问题所在。)问题是在上面的转储文件中可以看到非常小的(1056字节)对象数组,我在正在创建的代码中看不到这些对象数组,而且它们以某种方式保持有根状态。

还要注意,在转储堆段时,CDB 没有报告类型: 我不确定这是否相关。如果我转储带标记的(< ——)对象,CDB/SOS 可以报告它:

0:015> !DumpObj 000000005e62fd38
Name: System.Object[]
MethodTable: 00000642788d8ba8
EEClass: 00000642789d7660
Size: 1056(0x420) bytes
Array: Rank 1, Number of elements 128, Type CLASS
Element Type: System.Object
Fields:
None

对象数组的元素都是字符串,这些字符串可以从我们的应用程序代码中识别出来。

还有,我找不到自己的 GC 根了!GCRoot 命令挂起,再也不会返回(我甚至试过让它过夜)。

所以,如果有人能解释一下为什么这些小的(< 85k)对象数组最终会出现在 LOH 上: 什么情况下会出现这种情况,我将非常感激。NET 放一个小的对象数组在那里?还有,有没有人碰巧知道另一种确定这些物体根源的方法?


更新1

我昨天晚些时候提出的另一个理论是,这些对象数组一开始很大,但后来缩小了,留下的空闲内存块在内存转储中很明显。令我怀疑的是,对象数组总是显示为1056字节长(128个元素) ,128 * 8的引用和32字节的开销。

其思想是,库或 CLR 中的某些不安全代码可能损坏了数组标头中的元素字段数。我知道希望渺茫。


更新2

感谢 Brian Rasmussen (参见已接受的答案) ,这个问题已经被确定为由字符串实习表引起的 LOH 碎片!为了证实这一点,我编写了一个快速测试应用程序:

static void Main()
{
const int ITERATIONS = 100000;


for (int index = 0; index < ITERATIONS; ++index)
{
string str = "NonInterned" + index;
Console.Out.WriteLine(str);
}


Console.Out.WriteLine("Continue.");
Console.In.ReadLine();


for (int index = 0; index < ITERATIONS; ++index)
{
string str = string.Intern("Interned" + index);
Console.Out.WriteLine(str);
}


Console.Out.WriteLine("Continue?");
Console.In.ReadLine();
}

应用程序首先在循环中创建并取消引用唯一字符串。这只是为了证明在这种情况下内存不会泄漏。显然,它不应该也不应该这样做。

在第二个循环中,创建并实现唯一字符串。这次行动把他们的根扎在实习生桌上。我没有意识到的是实习生表是如何表示的。它似乎由一组页面组成——由128个字符串元素组成的对象数组——这些页面是在 LOH 中创建的。这一点在国开行/SOS 中更为明显:

0:000> .loadby sos mscorwks
0:000> !EEHeap -gc
Number of GC Heaps: 1
generation 0 starts at 0x00f7a9b0
generation 1 starts at 0x00e79c3c
generation 2 starts at 0x00b21000
ephemeral segment allocation context: none
segment    begin allocated     size
00b20000 00b21000  010029bc 0x004e19bc(5118396)
Large object heap starts at 0x01b21000
segment    begin allocated     size
01b20000 01b21000  01b8ade0 0x00069de0(433632)
Total Size  0x54b79c(5552028)
------------------------------
GC Heap Size  0x54b79c(5552028)

转储 LOH 段显示了我在泄漏应用程序中看到的模式:

0:000> !DumpHeap 01b21000 01b8ade0
...
01b8a120 793040bc      528
01b8a330 00175e88       16 Free
01b8a340 793040bc      528
01b8a550 00175e88       16 Free
01b8a560 793040bc      528
01b8a770 00175e88       16 Free
01b8a780 793040bc      528
01b8a990 00175e88       16 Free
01b8a9a0 793040bc      528
01b8abb0 00175e88       16 Free
01b8abc0 793040bc      528
01b8add0 00175e88       16 Free    total 1568 objects
Statistics:
MT    Count    TotalSize Class Name
00175e88      784        12544      Free
793040bc      784       421088 System.Object[]
Total 1568 objects

请注意,对象数组的大小是528(而不是1056) ,因为我的工作站是32位,应用程序服务器是64位。对象数组仍然有128个元素长。

所以这个故事的寓意就是要非常小心的实习。如果您正在实习的字符串不知道是有限集的成员,那么您的应用程序将由于 LOH 碎片而泄漏,至少在 CLR 的版本2中是这样。

在我们的应用程序的例子中,反序列化代码路径中有一些通用代码,它们在解组期间实习实体标识符: 我现在强烈怀疑这就是罪魁祸首。然而,开发人员的意图显然是好的,因为他们想确保如果相同的实体被反序列化多次,那么只有一个标识符字符串的实例将被维护在内存中。

37066 次浏览

If the format is recognizable as your application, why haven't you identified the code that is generating this string format? If there's several possibilities, try adding unique data to figure out which code path is the culprit.

The fact that the arrays are interleaved with large freed items leads me to guess that they were originally paired or at least related. Try to identify the freed objects to figure out what was generating them and the associated strings.

Once you identify what is generating these strings, try to figure out what would be keeping them from being GCed. Perhaps they're being stuffed in a forgotten or unused list for logging purposes or something similar.


EDIT: Ignore the memory region and the specific array size for the moment: just figure out what is being done with these strings to cause a leak. Try the !GCRoot when your program has created or manipulated these strings just once or twice, when there's fewer objects to trace.

The CLR uses the LOH to preallocate a few objects (such as the array used for interned strings). Some of these are less than 85000 bytes and thus would not normally be allocated on the LOH.

It is an implementation detail, but I assume the reason for this is to avoid unnecessary garbage collection of instances that are supposed to survive as long as the process it self.

Also due to a somewhat esoteric optimization, any double[] of 1000 or more elements is also allocated on the LOH.

When reading descriptions of how GC works, and the part about how long-lived objects end up in generation 2, and the collection of LOH objects happens at full collection only - as does collection of generation 2, the idea that springs to mind is... why not just keep generation 2 and large objects in the same heap, as they're going to get collected together?

If that's what actually happens then it would explain how small objects end up in the same place as the LOH - if they're long lived enough to end up in generation 2.

And so your problem would appear to be a pretty good rebuttal to the idea that occurs to me - it would result in the fragmentation of the LOH.

Summary: your problem could be explained by the LOH and generation 2 sharing the same heap region, although that is by no means proof that this is the explanation.

Update: the output of !dumpheap -stat pretty much blows this theory out of the water! The generation 2 and LOH have their own regions.

Great question, I learned by reading the questions.

I think other bit of the deserialisation code path are also using the large object heap, hence the fragmentation. If all the strings were interned at the SAME time, I think you would be ok.

Given how good the .net garbage collector is, just letting the deserialisation code path create normal string object is likely to be good enough. Don't do anything more complex until the need is proven.

I would at most look at keeping a hash table of the last few strings you have seen and reusing these. By limiting the hash table size and passing the size in when you create the table you can stop most fragmentation. You then need a way to remove strings you have not seen recently from the hash table to limit it’s size. But if the strings the deserialisation code path create are short lived anyway you will not gain much if anything.

Here are couple of ways to Identify the exact call-stack of LOH allocation.

And to avoid LOH fragmentation Pre-allocate large array of objects and pin them. Reuse these objects when needed. Here is post on LOH Fragmentation. Something like this could help in avoiding LOH fragmentation.

The .NET Framework 4.5.1, has the ability to explicitly compact the large object heap (LOH) during garbage collection.

GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect();

See more info in GCSettings.LargeObjectHeapCompactionMode