Haskell 线程堆溢出，尽管总内存使用量只有22Mb？

我正在尝试使射线追踪器并行化。这意味着我有一个非常长的小型计算列表。普通程序在特定场景上运行的时间为67.98秒，总内存使用量为13 MB，生产率为99.2% 。

在我的第一次尝试中，我使用了缓冲区大小为50的并行策略 parBuffer。我之所以选择 parBuffer，是因为它在列表中的遍历速度仅取决于火花的消耗速度，而且不会像 parList那样强制使用列表的脊柱，因为列表非常长，所以 parList会占用大量内存。使用 -N2，它的运行时间为100.46秒，总内存使用量为14 MB，生产率为97.8% 。火花信息是: SPARKS: 480000 (476469 converted, 0 overflowed, 0 dud, 161 GC'd, 3370 fizzled)

大部分失败的火花表明火花的粒度太小，所以接下来我尝试使用 parListChunk策略，它将列表分成几个块，并为每个块创建一个火花。我得到了最好的结果与一块大小的 0.25 * imageWidth。该程序运行时间为93.43秒，总内存使用量为236 MB，生产率为97.3% 。火花信息是: SPARKS: 2400 (2400 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)。我相信更大的内存使用是因为 parListChunk强制列表的脊柱。

Then I tried to write my own strategy that lazily divided the list into chunks and then passed the chunks to parBuffer and concatenated the results.

 concat $ withStrategy (parBuffer 40 rdeepseq) (chunksOf 100 (map colorPixel pixels))

这个过程运行时间为95.99秒，总内存使用量为22 MB，生产率为98.8% 。这是成功的意义上，所有的火花正在转换和内存使用低得多，但速度没有提高。下面是事件日志配置文件的一部分的图像。

正如您所看到的，由于堆溢出，线程正在停止。我尝试添加 +RTS -M1G，它将默认堆大小一直增加到1Gb。结果没有改变。我读到 Haskell 主线程在堆栈溢出时将使用堆中的内存，所以我也尝试用 +RTS -M1G -K1G增加默认堆栈大小，但这也没有影响。

Is there anything else I can try? I can post more detailed profiling info for memory usage or eventlog if needed, I did not include it all because it is a lot of information and I did not think all of it was necessary to include.

EDIT: I was reading about the Haskell RTS 多核支持, and it talks about there being a HEC (Haskell Execution Context) for each core. Each HEC contains, among other things, an Allocation Area (which is a part of a single shared heap). Whenever any HEC's Allocation Area is exhausted, a garbage collection must be performed. The appears to be a 即时通讯系统选项 to control it, -A. I tried -A32M but saw no difference.

编辑2: 这里有一个链接，指向一个专门针对这个问题的 github 回购。我已将分析结果包含在分析文件夹中。

编辑3: 下面是相关的代码:

render :: [([(Float,Float)],[(Float,Float)])] -> World -> [Color]
render grids world = cs where
ps = [ (i,j) | j <- reverse [0..wImgHt world - 1] , i <- [0..wImgWd world - 1] ]
cs = map (colorPixel world) (zip ps grids)
--cs = withStrategy (parListChunk (round (wImgWd world)) rdeepseq) (map (colorPixel world) (zip ps grids))
--cs = withStrategy (parBuffer 16 rdeepseq) (map (colorPixel world) (zip ps grids))
--cs = concat $ withStrategy (parBuffer 40 rdeepseq) (chunksOf 100 (map (colorPixel world) (zip ps grids)))

这些网格是随机的浮点数，由 color 像素预先计算并使用。 colorPixel的类型是:

 colorPixel :: World -> ((Float,Float),([(Float,Float)],[(Float,Float)])) -> Color

1931

小开

不是你问题的解决方案，而是原因的提示:

Haskell 在内存重用方面似乎非常保守，当解释器看到回收内存块的可能性时，它就会采取这种做法。您的问题描述符合这里(底部)描述的次要 GC 行为 Https://wiki.haskell.org/ghc/memory_management.

新数据在512kb 的“托儿所”中分配发生 GC-它扫描托儿所并释放未使用的值。

So if you chop the data into smaller chunks, you enable the engine to do the cleanup earlier - GC kicks in.