理解芹菜任务预取

我刚刚发现了配置选项 CELERYD_PREFETCH_MULTIPLIER(医生)。默认值是4,但(我相信)我希望关闭或尽可能低的预取。我现在把它设为1,这已经足够接近我要找的了,但是还有一些事情我不明白:

  1. 为什么这个预取是个好主意?除非消息队列和工作服务器之间存在大量延迟(在我的例子中,它们当前运行在同一个主机上,最坏的情况可能最终运行在同一个数据中心的不同主机上) ,否则我看不出其中的原因。文档只提到了缺点,但没有解释优点是什么。

  2. 许多人似乎将这个值设置为0,期望能够以这种方式关闭预取(在我看来这是一个合理的假设)。但是,0意味着无限制的预取。为什么会有人想要无限制的预取,这不是完全消除了您最初引入任务队列所要求的并发性/异步性吗?

  3. 为什么不能关闭预取?在大多数情况下,关闭它对于性能来说可能不是一个好主意,但是否存在不可能做到这一点的技术原因?还是只是没有实施?

  4. 有时,此选项连接到 CELERY_ACKS_LATE。比如说。罗杰 · 胡写道“[ ... ]通常[用户]真正想要的是让一个工作者只保留子进程中的任务数量。但是,如果不启用延迟确认(... ...) ,这是不可能的。”我不明白这两个选项是如何联系在一起的,为什么没有另一个选项就不可能。另一个提到的连接可以找到 给你。有人能解释一下为什么这两个选项是相关的吗?

26041 次浏览
  1. Prefetching can improve the performance. Workers don't need to wait for the next message from a broker to process. Communicating with a broker once and processing a lot of messages gives a performance gain. Getting a message from a broker (even from a local one) is expensive compared to the local memory access. Workers are also allowed to acknowledge messages in batches

  2. Prefetching set to zero means "no specific limit" rather than unlimited

  3. Setting prefetching to 1 is documented to be equivalent to turning it off, but this may not always be the case (see https://stackoverflow.com/a/33357180/71522)

  4. Prefetching allows to ack messages in batches. CELERY_ACKS_LATE=True prevents acknowledging messages when they reach to a worker

Just a warning: as of my testing with the redis broker + Celery 3.1.15, all of the advice I've read pertaining to CELERYD_PREFETCH_MULTIPLIER = 1 disabling prefetching is demonstrably false.

To demonstrate this:

  1. Set CELERYD_PREFETCH_MULTIPLIER = 1
  2. Queue up 5 tasks that will each take a few seconds (ex, time.sleep(5))
  3. Start watching the length of the task queue in Redis: watch redis-cli -c llen default

  4. Start celery worker -c 1

  5. Notice that the queue length in Redis will immediately drop from 5 to 3

CELERYD_PREFETCH_MULTIPLIER = 1 does not prevent prefetching, it simply limits the prefetching to 1 task per queue.

-Ofair, despite what the documentation says, also does not prevent prefetching.

Short of modifying the source code, I haven't found any method for entirely disabling prefetching.

I cannot comment on David Wolever's answers, since my stackcred isn't high enough. So, I've framed my comment as an answer since I'd like to share my experience with Celery 3.1.18 and a Mongodb broker. I managed to stop prefetching with the following:

  1. add CELERYD_PREFETCH_MULTIPLIER = 1 to the celery config
  2. add CELERY_ACKS_LATE = True to the celery config
  3. Start celery worker with options: --concurrency=1 -Ofair

Leaving CELERY_ACKS_LATE to the default, the worker still prefetches. Just like the OP I don't fully grasp the link between prefetching and late acks. I understand what David says "CELERY_ACKS_LATE=True prevents acknowledging messages when they reach to a worker", but I fail to understand why late acks would be incompatible with prefetch. In theory a prefetch would still allow to ack late right - even if not coded as such in celery ?

Old question, but still adding my answer in case it helps someone. My understanding from some initial testing was same as that in David Wolever's answer. I just tested this more in celery 3.1.19 and -Ofair does work. Just that it is not meant to disable prefetch at the worker node level. That will continue to happen. Using -Ofair has a different effect which is at the pool worker level. In summary, to disable prefetch completely, do this:

  1. Set CELERYD_PREFETCH_MULTIPLIER = 1
  2. Set CELERY_ACKS_LATE = True at a global level or task level
  3. Use -Ofair while starting the workers
  4. If you set concurrency to 1, then step 3 is not needed. If you want a higher concurrency, then step 3 is essential to avoid tasks getting backed up in a node that could be run long running tasks.

Adding some more details:

I found that the worker node will always prefetch by default. You can only control how many tasks it prefetches by using CELERYD_PREFETCH_MULTIPLIER. If set to 1, it will only prefetch as many tasks as the number of pool workers (concurrency) in the node. So if you had concurrency = n, the max tasks prefetched by the node will be n.

Without the -Ofair option, what happened for me was that if one of the pool worker processes was executing a long running task, the other workers in the node would also stop processing the tasks already prefetched by the node. By using -Ofair, that changed. Even though one of the workers in the node was executing a long running tasks, others would not stop processing and would continue to process the tasks prefetched by the node. So I see two levels of prefetching. One at the worker node level. The other at the individual worker level. Using -Ofair for me seemed to disable it at the worker level.

How is ACKS_LATE related? ACKS_LATE = True means that the task will be acknowledged only when the task succeeds. If not, I suppose it would happen when it is received by a worker. In case of prefetch, the task is first received by the worker (confirmed from logs) but will be executed later. I just realized that prefetched messages show up under "unacknowledged messages" in rabbitmq. So I'm not sure if setting it to True is absolutely needed. We anyway had our tasks set that way (late ack) for other reasons.

I experienced something a little bit different with SQS as broker.

The setup was:

CELERYD_PREFETCH_MULTIPLIER = 1
ACKS_ON_FAILURE_OR_TIMEOUT=False
CELERY_ACKS_LATE = True
CONCURRENCY=1

After task fail (exception raised), the worker became unavailable since the message was not acked, both local and remote queue.

The solution that made the workers continue consuming work was setting

CELERYD_PREFETCH_MULTIPLIER = 0

I can only speculate that acks_late was not taken in consideration when writing the SQS transport