在 Docker 上诊断 Mac 的高 CPU 使用率

如何在 MacOS 上诊断 Docker 的原因,特别是使用100% CPU 的 com.docker.hyperkit

docker CPU usage

码头统计

Docker 统计数据显示所有运行的容器都具有较低的 CPU、内存、净 IO 和块 IO。

docker stats output

Iosnoop

Iosnoop 显示,com.docker.hyperkit每秒执行约50次写入操作,总计每秒500KB 写入文件 Docker.qcow2。根据 什么是 Docker qCow2?Docker.qcow2是一个稀疏文件,它是所有 Docker 容器的持久存储。

在我的例子中,文件并没有那么稀疏,物理大小与逻辑大小相匹配。

docker.qcow actual size

Dtrace (dtruss)

Dtruss sudo dtruss -p $DOCKER_PID显示了大量的 psynch_cvsignalpsynch_cvwait调用。

psynch_cvsignal(0x7F9946002408, 0x4EA701004EA70200, 0x4EA70100)          = 257 0
psynch_mutexdrop(0x7F9946002318, 0x5554700, 0x5554700)           = 0 0
psynch_mutexwait(0x7F9946002318, 0x5554702, 0x5554600)           = 89474819 0
psynch_cvsignal(0x10BF7B470, 0x4C8095004C809600, 0x4C809300)             = 257 0
psynch_cvwait(0x10BF7B470, 0x4C8095014C809600, 0x4C809300)               = 0 0
psynch_cvwait(0x10BF7B470, 0x4C8096014C809700, 0x4C809600)               = -1 Err#316
psynch_cvsignal(0x7F9946002408, 0x4EA702004EA70300, 0x4EA70200)          = 257 0
psynch_cvwait(0x7F9946002408, 0x4EA702014EA70300, 0x4EA70200)            = 0 0
psynch_cvsignal(0x10BF7B470, 0x4C8097004C809800, 0x4C809600)             = 257 0
psynch_cvwait(0x10BF7B470, 0x4C8097014C809800, 0x4C809600)               = 0 0
psynch_cvwait(0x10BF7B470, 0x4C8098014C809900, 0x4C809800)               = -1 Err#316

更新: Docker 主机上的 top

来自 https://stackoverflow.com/a/58293240/30900:

docker run -it --rm --pid host busybox top

Docker 嵌入式主机的 CPU 使用率约为3% 。我的 MacBook 的 CPU 使用率是100% 。因此,docker 嵌入式主机并没有导致 CPU 使用量激增。

docker host top

更新: 运行大多数常见堆栈跟踪的 dtrace 脚本

来自 dtrace 脚本的堆栈跟踪如下所示: https://stackoverflow.com/a/58293035/30900

这些内核堆栈痕迹看起来无害。

              AppleIntelLpssGspi`AppleIntelLpssGspi::regRead(unsigned int)+0x1f
AppleIntelLpssGspi`AppleIntelLpssGspi::transferMmioDuplexMulti(void*, void*, unsigned long long, unsigned int)+0x91
AppleIntelLpssSpiController`AppleIntelLpssSpiController::transferDataMmioDuplexMulti(void*, void*, unsigned int, unsigned int)+0xb2
AppleIntelLpssSpiController`AppleIntelLpssSpiController::_transferDataSubr(AppleInfoLpssSpiControllerTransferDataRequest*)+0x5bc
AppleIntelLpssSpiController`AppleIntelLpssSpiController::_transferData(AppleInfoLpssSpiControllerTransferDataRequest*)+0x24f
kernel`IOCommandGate::runAction(int (*)(OSObject*, void*, void*, void*, void*), void*, void*, void*, void*)+0x138
AppleIntelLpssSpiController`AppleIntelLpssSpiDevice::transferData(IOMemoryDescriptor*, void*, unsigned long long, unsigned long long, IOMemoryDescriptor*, void*, unsigned long long, unsigned long long, unsigned int, AppleIntelSPICompletion*)+0x151
AppleHSSPISupport`AppleHSSPIController::transferData(IOMemoryDescriptor*, void*, unsigned long long, unsigned long long, IOMemoryDescriptor*, void*, unsigned long long, unsigned long long, unsigned int, AppleIntelSPICompletion*)+0xcc
AppleHSSPISupport`AppleHSSPIController::doSPITransfer(bool, AppleHSSPITransferRetryReason*)+0x97
AppleHSSPISupport`AppleHSSPIController::InterruptOccurred(IOInterruptEventSource*, int)+0xf8
kernel`IOInterruptEventSource::checkForWork()+0x13c
kernel`IOWorkLoop::runEventSources()+0x1e2
kernel`IOWorkLoop::threadMain()+0x2c
kernel`call_continuation+0x2e
53


kernel`waitq_wakeup64_thread+0xa7
pthread`__psynch_cvsignal+0x495
pthread`_psynch_cvsignal+0x28
kernel`psynch_cvsignal+0x38
kernel`unix_syscall64+0x27d
kernel`hndl_unix_scall64+0x16
60


kernel`hndl_mdep_scall64+0x4
113


kernel`ml_set_interrupts_enabled+0x19
524


kernel`ml_set_interrupts_enabled+0x19
kernel`hndl_mdep_scall64+0x10
5890


kernel`machine_idle+0x2f8
kernel`call_continuation+0x2e
43395

用户空间中最常见的超过17秒的堆栈跟踪显然与 com.docker.hyperkit 有关。在17秒内有1365个堆栈跟踪,其中 com.docker.hyperkit创建了平均每秒80个线程的线程。

              com.docker.hyperkit`0x000000010cbd20db+0x19f9
com.docker.hyperkit`0x000000010cbdb98c+0x157
com.docker.hyperkit`0x000000010cbf6c2d+0x4bd
libsystem_pthread.dylib`_pthread_body+0x7e
libsystem_pthread.dylib`_pthread_start+0x42
libsystem_pthread.dylib`thread_start+0xd
19


Hypervisor`hv_vmx_vcpu_read_vmcs+0x1
com.docker.hyperkit`0x000000010cbd4c4f+0x2a
com.docker.hyperkit`0x000000010cbd20db+0x174a
com.docker.hyperkit`0x000000010cbdb98c+0x157
com.docker.hyperkit`0x000000010cbf6c2d+0x4bd
libsystem_pthread.dylib`_pthread_body+0x7e
libsystem_pthread.dylib`_pthread_start+0x42
libsystem_pthread.dylib`thread_start+0xd
22


Hypervisor`hv_vmx_vcpu_read_vmcs
com.docker.hyperkit`0x000000010cbdb98c+0x157
com.docker.hyperkit`0x000000010cbf6c2d+0x4bd
libsystem_pthread.dylib`_pthread_body+0x7e
libsystem_pthread.dylib`_pthread_start+0x42
libsystem_pthread.dylib`thread_start+0xd
34


com.docker.hyperkit`0x000000010cbd878d+0x36
com.docker.hyperkit`0x000000010cbd20db+0x42f
com.docker.hyperkit`0x000000010cbdb98c+0x157
com.docker.hyperkit`0x000000010cbf6c2d+0x4bd
libsystem_pthread.dylib`_pthread_body+0x7e
libsystem_pthread.dylib`_pthread_start+0x42
libsystem_pthread.dylib`thread_start+0xd
47


Hypervisor`hv_vcpu_run+0xd
com.docker.hyperkit`0x000000010cbd20db+0x6b6
com.docker.hyperkit`0x000000010cbdb98c+0x157
com.docker.hyperkit`0x000000010cbf6c2d+0x4bd
libsystem_pthread.dylib`_pthread_body+0x7e
libsystem_pthread.dylib`_pthread_start+0x42
libsystem_pthread.dylib`thread_start+0xd
135

相关问题

Github-docker/for-mac: com.docker.hyperkit 100% 的 CPU 使用率又回来了 # 3499 . One comment suggests adding volume caching described here: https://www.docker.com/blog/user-guided-caching-in-docker-for-mac/. I tried this and got a small ~10% reduction in CPU usage.

44563 次浏览

This is a small dTrace script I use to find where the kernel is spending its time (it's from Solaris, and dates back to the early days of Solaris 10):

#!/usr/sbin/dtrace -s


profile:::profile-1001hz
/arg0/
{
@[ stack() ] = count();
}

It simply samples kernel stack traces and counts each one it encounters in the @ aggregation.

Run it as root:

... # ./kernelhotspots.d > /tmp/kernel_hot_spots.txt

Let it run for a decent amount of time while you're having CPU issues, then hit CTRL-C to break the script. It will emit all the kernel stack traces it encountered, the most common last. If you need more (or less) stack frames from the default with

    @[ stack( 15 ) ] = count();

That will show a stack frame 15 calls deep.

The last few stack traces will be where your kernel is spending most of its time. That may or may not be informative.

This script will do the same for user-space stack traces:

#!/usr/sbin/dtrace -s


profile:::profile-1001hz
/arg1/
{
@[ ustack() ] = count();
}

Run it similarly:

... # ./userspacehotspots.d > /tmp/userspace_hot_spots.txt

ustack() is a bit slower - to emit the actual function names, dTrace has to do a lot more work to get them from the address spaces of the appropriate processes.

Disabling System Integrity Protection might help you get better stack traces.

See DTrace Action Basics for some more details.

My suspicion is that the issue is IO related. With MacOS volumes, this involves osxfs where there is some performance tuning you can perform. Mainly, if you can accept fewer consistency checks, you can set the volume mode to delegated for faster performance. See the docs for more details: https://docs.docker.com/docker-for-mac/osxfs-caching/. However, if your image contains a large number of small files, performance will suffer, especially if you also have lots of image layers.

You can also try the following command to debug any process issues within the embedded VM that docker uses:

docker run -it --rm --pid host busybox top

(To exit, use <ctrl>-c)


To track down if it's IO, you can also try the following:

$ docker run -it --rm --pid host alpine /bin/sh
$ apk add sysstat
$ pidstat -d 5 12

That will run inside the alpine container running in the VM pid namespace, showing any IO happening from any process, whether or not that process is inside of a container. The stats are every 5 seconds for one minute (12 times) and then it will give you an average table per process. You can then <ctrl>-d to destroy the alpine container.


From the comments and edits, these stats may check out. A 4 core MBP has 8 threads, so full CPU utilization should be 800% if MacOS is reporting the same as other Unix based systems. Inside the VM there's over 100% load shown in the top command for the average in the past minute (though less from the 5 and 15 averages) which is roughly what you see for the hyperkit process on the host. The instantaneous usage is over 12% from top, not 3%, since you need to add the system and user percentages. And the IO numbers shown in pidstat align roughly with what you see written to the qcow2 image.


If the docker engine itself is thrashing (e.g. restarting containers, or running lots of healthchecks), then you can debug that by watching the output of:

docker events

I have the same problem. My CPU % went back down to normal after I removed all my volumes.

docker system prune --volumes

I also manually removed some named volumes:

docker volume rm NameOfVolumeHere

That doesn't solve the overall issue of not being able to use volumes with Docker for mac. Right now I'm just being careful about the amount of volumes I use and closing Docker desktop when not in use.

Changing the volumes to use a delegated configuration worked for me and resulted in a drastic drop in CPU usage. see the document: https://docs.docker.com/docker-for-mac/osxfs-caching/#delegated

how set in my docker-compose.yml:

version: "3"
services:
my_service:
image: python3.6
ports:
- "80:10000"
volumes:
- ./code:/www/code:cached

For me this worked, macOS 10.15.5, Docker Desktop 2.3.0

EDIT: after a few weeks, my cpu issues have come back - so the below solutions probably aren't worth it

My CPU was always running crazy high, and it wasn't I/O, as determined using docker stats

I did a bunch of stuff, but had it suddenly decrease to reasonable levels and stay that way for over a week now, after doing the following:

  • Ensure you have the right # of CPU's set - not what you have, but HALF that amount. Mine was more than half, and I feel this was the real problem, in Preferences | Resources
  • decrease # of file shares if possible - Preferences | Resources, /private, /tmp/, /var/folders
  • disable use gRPC FUSE for file sharing - Preferences | Resources

to disable use gRPC FUSE for file sharing might not good idea. I found the feedback from another issue made by docker community. see bellow:


So we'll look into that. However,
osxfs will not be supported long term.
We can't maintain two solutions.

hier to docker issue thread

The solution I found was to increase the resources given to Docker. I increased the Memory from 2GB to 8GB, the Swap from 1GB to 2GB, and the disk image size to 160GB. Completely solved the problem for me, and it's an easy one for readers to try.

Had same issue with docker today in Big Sur (tried pruning images, changing to apple virtualization, nothing helped). However, disabling the docker desktop to startup in preferences and never opening the desktop gui seems to fix it for me. Docker now runs with only 10%cpu usage even after starting a few containers. However, once I open the desktop gui it slowly rises again to +90% cpu and keeps on hogging the cpu even after closing the DockerDesktop process. Docker version 20.10.13, build a224086.

There is an open issue here https://github.com/docker/for-mac/issues/6166

It seems there are a few bugs going on

  1. For some people (me including) unchecking the "Open Docker Dashboard at startup" and manually restarting docker do the job. docker desktop screenshot

  2. For other people increasing resources like CPU and Memory works enter image description here