容器运行超出内存限制

在 Hadoop v1中,我已经分配了每个1GB 大小的7个 mapper 和 reduce 槽,我的 mapper 和 reduce 运行良好。我的机器有8G 内存,8个处理器。 现在使用 YARN,当在同一台机器上运行相同的应用程序时,我得到了容器错误。 默认情况下,我有以下设置:

  <property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>8192</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
</property>

它给了我一个错误:

Container [pid=28920,containerID=container_1389136889967_0001_01_000121] is running beyond virtual memory limits. Current usage: 1.2 GB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.

然后,我尝试在 mapred-site. xml 中设置内存限制:

  <property>
<name>mapreduce.map.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>4096</value>
</property>

但仍然存在错误:

Container [pid=26783,containerID=container_1389136889967_0009_01_000002] is running beyond physical memory limits. Current usage: 4.2 GB of 4 GB physical memory used; 5.2 GB of 8.4 GB virtual memory used. Killing container.

我不明白为什么映射任务需要这么多内存。在我的理解中,1GB 的内存对于我的 map/reduce 任务来说已经足够了。为什么当我将更多的内存分配给容器时,任务会使用更多的内存?是因为每个任务都有更多的分工吗?我觉得更有效的方法是稍微减小容器的大小并创建更多的容器,这样就可以并行运行更多的任务。问题是,我如何确保每个容器不会被分配到超出其处理能力的分割?

150906 次浏览

You should also properly configure the maximum memory allocations for MapReduce. From this HortonWorks tutorial:

[...]

Each machine in our cluster has 48 GB of RAM. Some of this RAM should be >reserved for Operating System usage. On each node, we’ll assign 40 GB RAM for >YARN to use and keep 8 GB for the Operating System

For our example cluster, we have the minimum RAM for a Container (yarn.scheduler.minimum-allocation-mb) = 2 GB. We’ll thus assign 4 GB for Map task Containers, and 8 GB for Reduce tasks Containers.

In mapred-site.xml:

mapreduce.map.memory.mb: 4096

mapreduce.reduce.memory.mb: 8192

Each Container will run JVMs for the Map and Reduce tasks. The JVM heap size should be set to lower than the Map and Reduce memory defined above, so that they are within the bounds of the Container memory allocated by YARN.

In mapred-site.xml:

mapreduce.map.java.opts: -Xmx3072m

mapreduce.reduce.java.opts: -Xmx6144m

The above settings configure the upper limit of the physical RAM that Map and Reduce tasks will use.

To sum it up:

  1. In YARN, you should use the mapreduce configs, not the mapred ones. EDIT: This comment is not applicable anymore now that you've edited your question.
  2. What you are configuring is actually how much you want to request, not what is the max to allocate.
  3. The max limits are configured with the java.opts settings listed above.

Finally, you may want to check this other SO question that describes a similar problem (and solution).

I can't comment on the accepted answer, due to low reputation. However, I would like to add, this behavior is by design. The NodeManager is killing your container. It sounds like you are trying to use hadoop streaming which is running as a child process of the map-reduce task. The NodeManager monitors the entire process tree of the task and if it eats up more memory than the maximum set in mapreduce.map.memory.mb or mapreduce.reduce.memory.mb respectively, we would expect the Nodemanager to kill the task, otherwise your task is stealing memory belonging to other containers, which you don't want.

There is a check placed at Yarn level for Virtual and Physical memory usage ratio. Issue is not only that VM doesn't have sufficient physical memory. But it is because Virtual memory usage is more than expected for given physical memory.

Note : This is happening on Centos/RHEL 6 due to its aggressive allocation of virtual memory.

It can be resolved either by :

  1. Disable virtual memory usage check by setting yarn.nodemanager.vmem-check-enabled to false;

  2. Increase VM:PM ratio by setting yarn.nodemanager.vmem-pmem-ratio to some higher value.

References :

https://issues.apache.org/jira/browse/HADOOP-11364

http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/

Add following property in yarn-site.xml

 <property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for containers</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
<description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
</property>

I had a really similar issue using HIVE in EMR. None of the extant solutions worked for me -- ie, none of the mapreduce configurations worked for me; and neither did setting yarn.nodemanager.vmem-check-enabled to false.

However, what ended up working was setting tez.am.resource.memory.mb, for example:

hive -hiveconf tez.am.resource.memory.mb=4096

Another setting to consider tweaking is yarn.app.mapreduce.am.resource.mb

While working with spark in EMR I was having the same problem and setting maximizeResourceAllocation=true did the trick; hope it helps someone. You have to set it when you create the cluster. From the EMR docs:

aws emr create-cluster --release-label emr-5.4.0 --applications Name=Spark \
--instance-type m3.xlarge --instance-count 2 --service-role EMR_DefaultRole --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole --configurations https://s3.amazonaws.com/mybucket/myfolder/myConfig.json

Where myConfig.json should say:

[
{
"Classification": "spark",
"Properties": {
"maximizeResourceAllocation": "true"
}
}
]

We also faced this issue recently. If the issue is related to mapper memory, couple of things I would like to suggest that needs to be checked are.

  • Check if combiner is enabled or not? If yes, then it means that reduce logic has to be run on all the records (output of mapper). This happens in memory. Based on your application you need to check if enabling combiner helps or not. Trade off is between the network transfer bytes and time taken/memory/CPU for the reduce logic on 'X' number of records.
    • If you feel that combiner is not much of value, just disable it.
    • If you need combiner and 'X' is a huge number (say millions of records) then considering changing your split logic (For default input formats use less block size, normally 1 block size = 1 split) to map less number of records to a single mapper.
  • Number of records getting processed in a single mapper. Remember that all these records need to be sorted in memory (output of mapper is sorted). Consider setting mapreduce.task.io.sort.mb (default is 200MB) to a higher value if needed. mapred-configs.xml
  • If any of the above didn't help, try to run the mapper logic as a standalone application and profile the application using a Profiler (like JProfiler) and see where the memory getting used. This can give you very good insights.

Running yarn on Windows Linux subsystem with Ubunto OS, error "running beyond virtual memory limits, Killing container" I resolved it by disabling virtual memory check in the file yarn-site.xml

<property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property>

I haven't personally checked, but hadoop-yarn-container-virtual-memory-understanding-and-solving-container-is-running-beyond-virtual-memory-limits-errors sounds very reasonable

I solved the issue by changing yarn.nodemanager.vmem-pmem-ratio to a higher value , and I would agree that:

Another less recommended solution is to disable the virtual memory check by setting yarn.nodemanager.vmem-check-enabled to false.

I am practicing Hadoop programs (version hadoop3). Via virtual box I have installed Linux OS. We allocate very limited memory at time of installation of Linux. By setting the following memory limit properties in mapred-site.xml and restarting your HDFS and YARN then my program worked.

 <property>
<name>mapreduce.map.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>4096</value>
</property>