在捕获 TERM 并发送退出后,Heroku 上的麒麟退出超时

我收到 R12退出超时错误的 Heroku 应用程序运行独角兽和 sidekiq。这些错误每天发生1-2次,并且只要我部署。我明白我需要转换来自 Heroku 的关闭信号,以便独角兽能够正确地做出反应,但是我想我已经在下面的独角兽配置中做到了:

worker_processes 3
timeout 30
preload_app true


before_fork do |server, worker|
Signal.trap 'TERM' do
puts "Unicorn master intercepting TERM and sending myself QUIT instead. My PID is #{Process.pid}"
Process.kill 'QUIT', Process.pid
end


if defined?(ActiveRecord::Base)
ActiveRecord::Base.connection.disconnect!
Rails.logger.info('Disconnected from ActiveRecord')
end
end


after_fork do |server, worker|
Signal.trap 'TERM' do
puts "Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is #{Process.pid}"
end


if defined?(ActiveRecord::Base)
ActiveRecord::Base.establish_connection
Rails.logger.info('Connected to ActiveRecord')
end


Sidekiq.configure_client do |config|
config.redis = { :size => 1 }
end
end

围绕该错误的日志如下:

Stopping all processes with SIGTERM
Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is 7
Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is 11
Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is 15
Unicorn master intercepting TERM and sending myself QUIT instead. My PID is 2
Started GET "/manage"
reaped #<Process::Status: pid 11 exit 0> worker=1
reaped #<Process::Status: pid 7 exit 0> worker=0
reaped #<Process::Status: pid 15 exit 0> worker=2
master complete
Error R12 (Exit timeout) -> At least one process failed to exit within 10 seconds of SIGTERM
Stopping remaining processes with SIGKILL
Process exited with status 137

似乎所有子进程在超时之前都已成功获取。有没有可能主人还活着?此外,路由器是否应该在关闭期间仍然向 dyno 发送 Web 请求,如日志中所示?

FWIW,我正在使用 Heroku 的零停机时间部署插件(https://devcenter.heroku.com/articles/labs-preboot/)。

3914 次浏览

I think your custom signal handling is what's causing the timeouts here.

EDIT: I'm getting downvoted for disagreeing with Heroku's documentation and I'd like to address this.

Configuring your Unicorn application to catch and swallow the TERM signal is the most likely cause of your application hanging and not shutting down correctly.

Heroku seems to argue that catching and transforming a TERM signal into a QUIT signal is the right behavior to turn a hard shutdown into a graceful shutdown.

However, doing this seems to introduce the risk of no shutdown at all in some cases - the root of this bug. Users experiencing hanging dynos running Unicorn should consider the evidence and make their own decision based on first principles, not just documentation.