Http keep-alive in the Modern age

因此,根据 haxy 的作者所说,谁对 http://http://http://http://http://http://http://http://http://http://http://http://http://http://http://http://http://http://http://http

保持活跃是为了减少 CPU 的使用量而发明的 当 CPU 为100时在服务器上的使用情况 但是没有说的是 持久连接消耗一个 大量的内存而不能使用 除了客户 打开它们。在2009年的今天,CPU 是 非常便宜,内存仍然有限 到几个千兆字节的架构 或价格。如果一个网站需要 保持活力,有一个真正的问题。 高加载站点通常禁用 保持活力,以支持最大限度 同时客户的数量 没有生存的真正负面影响 是一个稍微增加的潜伏期 获取对象 上的并发连接数 非保留部位,以补偿 这个。

(来自 http://haproxy.1wt.eu/)

这和其他人的经历一致吗?如果没有维持生命,结果现在几乎看不出来了吗?(可能值得注意的是,对于响应性非常好的应用程序,使用 websockets 等——无论保持活动状态如何,连接都保持“打开”状态)。 对于远离服务器的人来说,这种效果更好吗? 或者在加载页面时,有许多工件需要从同一个主机加载?(我认为像 CSS、图像和 JS 这样的东西越来越多地来自缓存友好的 CDN)。

有什么想法吗?

(不确定这是不是 serverfault.com 的东西,但我不会交叉发布,直到有人告诉我把它移到那里)。

27318 次浏览

my understanding was that it had little to do with CPU, but the latency in opening of repeated sockets to the other side of the world. even if you have infinite bandwidth, connect latency will slow down the whole process. amplified if your page has dozens of objects. even a persistent connection has a request/response latency but its reduced when you have 2 sockets as on average, one should be streaming data while the other could be blocking. Also, a router is never going to assume a socket connects before letting you write to it. It needs the full round trip handshake. again, i dont claim to be an expert, but this is how i always saw it. what would really be cool is a fully ASYNC protocol (no, not a fully sick protocol).

Hey since I'm the author of this citation, I'll respond :-)

There are two big issues on large sites : concurrent connections and latency. Concurrent connection are caused by slow clients which take ages to download contents, and by idle connection states. Those idle connection states are caused by connection reuse to fetch multiple objects, known as keep-alive, which is further increased by latency. When the client is very close to the server, it can make an intensive use of the connection and ensure it is almost never idle. However when the sequence ends, nobody cares to quickly close the channel and the connection remains open and unused for a long time. That's the reason why many people suggest using a very low keep-alive timeout. On some servers like Apache, the lowest timeout you can set is one second, and it is often far too much to sustain high loads : if you have 20000 clients in front of you and they fetch on average one object every second, you'll have those 20000 connections permanently established. 20000 concurrent connections on a general purpose server like Apache is huge, will require between 32 and 64 GB of RAM depending on what modules are loaded, and you can probably not hope to go much higher even by adding RAM. In practice, for 20000 clients you may even see 40000 to 60000 concurrent connections on the server because browsers will try to set up 2 to 3 connections if they have many objects to fetch.

If you close the connection after each object, the number of concurrent connections will dramatically drop. Indeed, it will drop by a factor corresponding to the average time to download an object by the time between objects. If you need 50 ms to download an object (a miniature photo, a button, etc...), and you download on average 1 object per second as above, then you'll only have 0.05 connection per client, which is only 1000 concurrent connections for 20000 clients.

Now the time to establish new connections is going to count. Far remote clients will experience an unpleasant latency. In the past, browsers used to use large amounts of concurrent connections when keep-alive was disabled. I remember figures of 4 on MSIE and 8 on Netscape. This would really have divided the average per-object latency by that much. Now that keep-alive is present everywhere, we're not seeing that high numbers anymore, because doing so further increases the load on remote servers, and browsers take care of protecting the Internet's infrastructure.

This means that with todays browsers, it's harder to get the non-keep-alive services as much responsive as the keep-alive ones. Also, some browsers (eg: Opera) use heuristics to try to use pipelinining. Pipelining is an efficient way of using keep-alive, because it almost eliminates latency by sending multiple requests without waiting for a response. I have tried it on a page with 100 small photos, and the first access is about twice as fast as without keep-alive, but the next access is about 8 times as fast, because the responses are so small that only latency counts (only "304" responses).

I'd say that ideally we should have some tunables in the browsers to make them keep the connections alive between fetched objects, and immediately drop it when the page is complete. But we're not seeing that unfortunately.

For this reason, some sites which need to install general purpose servers such as Apache on the front side and which have to support large amounts of clients generally have to disable keep-alive. And to force browsers to increase the number of connections, they use multiple domain names so that downloads can be parallelized. It's particularly problematic on sites making intensive use of SSL because the connection setup is even higher as there is one additional round trip.

What is more commonly observed nowadays is that such sites prefer to install light frontends such as haproxy or nginx, which have no problem handling tens to hundreds of thousands of concurrent connections, they enable keep-alive on the client side, and disable it on the Apache side. On this side, the cost of establishing a connection is almost null in terms of CPU, and not noticeable at all in terms of time. That way this provides the best of both worlds : low latency due to keep-alive with very low timeouts on the client side, and low number of connections on the server side. Everyone is happy :-)

Some commercial products further improve this by reusing connections between the front load balancer and the server and multiplexing all client connections over them. When the servers are close to the LB, the gain is not much higher than previous solution, but it will often require adaptations on the application to ensure there is no risk of session crossing between users due to the unexpected sharing of a connection between multiple users. In theory this should never happen. Reality is much different :-)

Very long keep-alives can be useful if you're using an "origin pull" CDN such as CloudFront or CloudFlare. In fact, this can work out to be faster than no CDN, even if you're serving completely dynamic content.

If you have long keep alives such that each PoP basically has a permanent connection to your server, then the first time users visit your site, they can do a fast TCP handshake with their local PoP instead of a slow handshake with you. (Light itself takes around 100ms to go half-way around the world via fiber, and establishing a TCP connection requires three packets to be passed back and forth. SSL requires three round-trips.)

In the years since this was written (and posted here on stackoverflow) we now have servers such as nginx which are rising in popularity.

nginx for example can hold open 10,000 keep-alive connections in a single process with only 2.5 MB (megabytes) of RAM. In fact it's easy to hold open multiple thousands of connections with very little RAM, and the only limits you'll hit will be other limits such as the number of open file handles or TCP connections.

Keep-alive was a problem not because of any problem with the keep-alive spec itself but because of Apache's process-based scaling model and of keep-alives hacked into a server whose architecture wasn't designed to accommodate it.

Especially problematic is Apache Prefork + mod_php + keep-alives. This is a model where every single connection will continue to occupy all the RAM that a PHP process occupies, even if it's completely idle and only remains open as a keep-alive. This is not scalable. But servers don't have to be designed this way - there's no particular reason a server needs to keep every keep-alive connection in a separate process (especially not when every such process has a full PHP interpreter). PHP-FPM and an event-based server processing model such as that in nginx solve the problem elegantly.

Update 2015:

SPDY and HTTP/2 replace HTTP's keep-alive functionality with something even better: the ability not only to keep alive a connection and make multiple requests and responses over it, but for them to be multiplexed, so the responses can be sent in any order, and in parallel, rather than only in the order they were requested. This prevents slow responses blocking faster ones and removes the temptation for browsers to hold open multiple parallel connections to a single server. These technologies further highlight the inadequacies of the mod_php approach and the benefits of something like an event-based (or at the very least, multi-threaded) web server coupled separately with something like PHP-FPM.