视频流上的 TCP 与 UDP

我刚从网络编程考试回来,他们问我们的问题之一就是 “如果你要流视频,你会使用 TCP 还是 UDP?对存储的视频和实时视频流进行解释。对于这个问题,他们只是期待一个简短的答案 TCP 为存储视频和 UDP 为直播视频,但我想了这一点在我回家的路上,是否一定更好地使用 UDP 为流直播视频?我的意思是,如果你有足够的带宽,并且说你正在播放一场足球比赛或者音乐会,你真的需要使用 UDP 吗?

让我们假设当你在用 TCP 传输这场演唱会或者其他任何东西的时候,你开始丢失数据包(在你和发送者之间的某个网络中发生了一些不好的事情) ,整整一分钟你没有得到任何数据包。视频流将暂停,一分钟过去后,数据包开始再次通过(IP 为您找到了一个新的路由)。接下来会发生的情况是,TCP 会在您丢失的那一分钟重新传输,并继续向您发送实时流。作为一个假设,带宽高于流上的比特率,并且 ping 不是太高,所以在很短的时间内,你丢失的一分钟将充当流的缓冲区,这样,如果数据包丢失再次发生,你不会注意到。

现在,我可以想到一些设备,这将不是一个好主意,例如视频会议,你 需要总是在流的最后,因为延迟在视频聊天是可怕的,但在足球比赛,或音乐会,如果你是一分钟后,流有什么关系呢?另外,您可以保证您得到所有的数据,并且最好保存起来以便以后查看,当它进来时没有任何错误。

这就引出了我的问题。使用 TCP 进行直播有什么我不知道的缺点吗?或者真的应该这样,如果你有它的带宽,你应该去 TCP,因为它是“更好”的网络(流控制) ?

177265 次浏览

Usually a video stream is somewhat fault tolerant. So if some packages get lost (due to some router along the way being overloaded, for example), then it will still be able to display the content, but with reduced quality.

If your live stream was using TCP/IP, then it would be forced to wait for those dropped packages before it could continue processing newer data.

That's doubly bad:

  • old data be re-transmitted (that's probably for a frame that was already displayed and therefore worthless) and
  • new data can't arrive until after old data was re-transmitted

If your goal is to display as up-to-date information as possible (and for a live-stream you usually want to be up-to-date, even if your frames look a bit worse), then TCP will work against you.

For a recorded stream the situation is slightly different: you'll probably be buffering a lot more (possibly several minutes!) and would rather have data re-transmitted than have some artifacts due to lost packages. In this case TCP is a good match (this could still be implemented in UDP, of course, but TCP doesn't have as much drawbacks as for the live stream case).

For video streaming bandwidth is likely the constraint on the system. Using multicast you can greatly reduce the amount of upstream bandwidth used. With UDP you can easily multicast your packets to all connected terminals. You could also use a reliable multicast protocol, one is called Pragmatic General Multicast (PGM), I don't know anything about it and I guess it isn't widespread in its use.

Drawbacks of using TCP for live video:

  1. As you mentioned, TCP buffers the unacknowledged segments for every client. In some cases this is undesirable, such as TCP streaming for very popular live events: your list of simultaneous clients (and buffering requirements) are large in this case. Pre-recorded video-casts typically don't have as much of a problem with this because viewers tend to stagger their replay activity.

  2. TCP's delivery guarantees are a blocking function which isn't helpful in interactive conversations. Assume your network connection drops for 15 seconds. When we miss part of a conversation, we naturally ask the person to repeat (or the other party will proactively repeat if it seems like you missed something). UDP doesn't care if you missed part of a conversation for the last 15 seconds; it keeps working as if nothing happened. On the other hand, the app could be designed for TCP to replay the last 15 seconds (and the person on the other end may not want or know about that). Such a replay by TCP aggravates the problem, and makes it more difficult to stay in sync with other parties in the conversation. Comparing TCP and UDP’s behavior in the face of packet loss, one could say that it’s easier for UDP to stay in sync with the state of an interactive conversation.

  3. IP multicast significantly reduces video bandwidth requirements for large audiences; multicast requires UDP (and is incompatible with TCP). Note - multicast is generally restricted to private networks. Please note that multicast over the internet is not common. I would also point out that operating multicast networks is more complicated than operating typical unicast networks.

FYI, please don't use the word "packages" when describing networks. Networks send "packets".

but during a soccer-match, or a concert what does it matter if you are a single minute behind the stream?

To some soccer fans, quite a bit. It has been remarked that delays of even a few seconds present in digital video streams due to encoding (or whatever) can be very annoying when, during high-profile events such as world cup matches, you can hear the cheers and groans from the guys next door (who are watching an undelyed analog program) before you get to see the game moves that caused them.

I think that to someone caring a lot about sports (and those are the biggest group of paying customers for digital TV, at least here in Germany), being a minute behind in a live video stream would be completely unacceptable (As in, they'd switch to your competitor where this doesn't happen).

It depends. How critical is the content you are streaming? If critical use TCP. This may cause issues in bandwidth, video quality (you might have to use a lower quality to deal with latency), and latency. But if you need the content to guaranteed get there, use it.

Otherwise UDP should be fine if the stream is not critical and would be preferred because UDP tends to have less overhead.

I recommend you to look at new p2p live protocol Bittorent Live.

As for streaming it's better to use UDP, first because it lowers the load on servers, but mostly because you can send packets with multicast, it's simpler than sending it to each connected client.

One of the biggest problems with delivering live events on Internet is 'scale', and TCP doesn’t scale well. For example when you are beaming a live football match -as opposed to an on demand movie playback- the number of people watching can easily be 1000 times more. In such a scenario using TCP is a death sentence for the CDNs (content delivery networks).

There are a couple of main reasons why TCP doesn't scale well:

  1. One of the largest tradeoffs of TCP is the variability of throughput achievable between the sender and the receiver. When streaming video over the Internet the video packets must traverse multiple routers over the Internet, each of these routers is connected with different speed connections. The TCP algorithm starts with TCP window off small, then grows until packet loss is detected, the packet loss is considered a sign of congestion and TCP responds to it by drastically reducing the window size to avoid congestion. Thus in turn reducing the effective throughput immediately. Now imagine a network with TCP transmission using 6-7 router hops to the client (a very normal scenario), if any of the intermediate router looses any packet, the TCP for that link will reduce the transmission rate. In-fact The traffic flow between routers follow an hourglass kind of a shape; always gong up and down in-between one of the intermediate routers. Rendering the effective through put much lower compared to best-effort UDP.

  2. As you may already know TCP is an acknowledgement-based protocol. Lets for example say a sender is 50ms away (i.e. latency btw two points). This would mean time it takes for a packet to be sent to a receiver and receiver to send an acknowledgement would be 100ms; thus maximum throughput possible as compared to UDP based transmission is already halved.

  3. The TCP doesn’t support multicasting or the new emerging standard of multicasting AMT. Which means the CDNs don’t have the opportunity to reduce network traffic by replicating the packets -when many clients are watching the same content. That itself is a big enough reason for CDNs (like Akamai or Level3) to not go with TCP for live streams.

While reading the TCP UDP debate I noticed a logical flaw. A TCP packet loss causing a one minute delay that's converted into a one minute buffer cant be correlated to UDP dropping a full minute while experiencing the same loss. A more fair comparison is as follows.

TCP experiences a packet loss. The video is stopped while TCP resend's packets in an attempt to stream mathematically perfect packets. Video is delayed for one minute and picks up where it left off after missing packet makes its destination. We all wait but we know we wont miss a single pixel.

UDP experiences a packet loss. For a second during the video stream a corner of the screen gets a little blurry. No one notices and the show goes on without looking for the lost packets.

Anything that streams gains the most benefits from UDP. The packet loss causing a one minute delay to TCP would not cause a one minute delay to UDP. Considering that most systems use multiple resolution streams making things go blocky when starving for packets, makes even more sense to use UDP.

UDP FTW when streaming.

If the bandwidth is far higher than the bitrate, I would recommend TCP for unicast live video streaming.

Case 1: Consecutive packets are lost for a duration of several seconds. => live video will stop on the client side whatever the transport layer is (TCP or UDP). When the network recovers: - if TCP is used, client video player can choose to restart the stream at the first packet lost (timeshift) OR to drop all late packets and to restart the video stream with no timeshift. - if UDP is used, there is no choice on the client side, video restart live without any timeshift. => TCP equal or better.

Case 2: some packets are randomly and often lost on the network. - if TCP is used, these packets will be immediately retransmitted and with a correct jitter buffer, there will be no impact on the video stream quality/latency. - if UDP is used, video quality will be poor. => TCP much better

Besides all the other reasons, UDP can use multicast. Supporting 1000s of TCP users all transmitting the same data wastes bandwidth. However, there is another important reason for using TCP.

TCP can much more easily pass through firewalls and NATs. Depending on your NAT and operator, you may not even be able to receive a UDP stream due to problems with UDP hole punching.

All the 'use UDP' answers assume an open network and 'stuff it as much as you can' approach. Good for old-style closed-garden dedicated audio/video networks, which are a vanishing sort.

In the actual world, your transmission will go through firewalls (that will drop multicast and sometimes udp), the network is shared with others more important ($$$) apps, so you want to punish abusers with window scaling.

This is the thing, it is more a matter of content than it is a time issue. The TCP protocol requires that a packet that was not delivered must be check, verified and redelivered. UDP does not use this requirement. So if you sent a file which contains millions of packets using UDP, like a video, if some of the packets are missing upon delivery, they will most likely go unmissed.

There are some use cases suitable to UDP transport and others suitable to TCP transport.

The use case also dictates encoding settings for the video. When broadcasting soccer match focus is on quality and for video conference focus is on latency.

When using multicast to deliver video to your customers then UDP is used.

Requirement for multicast is expensive networking hardware between broadcasting server and customer. In practice this means if your company owns network infrastructure you can use UDP and multicast for live video streaming. Even then quality-of-service is also implemented to mark video packets and prioritize them so no packet loss happens.

Multicast will simplify broadcasting software because network hardware will handle distributing packets to customers. Customers subscribe to multicast channels and network will reconfigure to route packets to new subscriber. By default all channels are available to all customers and can be optimally routed.

This workflow places dificulty on authorization process. Network hardware does not differentiate subscribed users from other users. Solution to authorization is in encrypting video content and enabling decryption in player software when subscription is valid.

Unicast (TCP) workflow allows server to check client's credentials and only allow valid subscriptions. Even allow only certain number of simultaneous connections.

Multicast is not enabled over internet.

For delivering video over internet TCP must be used. When UDP is used developers end up re-implementing packet re-transmission, for eg. Bittorrent p2p live protocol.

"If you use TCP, the OS must buffer the unacknowledged segments for every client. This is undesirable, particularly in the case of live events".

This buffer must exist in some form. Same is true for jitter buffer on player side. It is called "socket buffer" and server software can know when this buffer is full and discard proper video frames for live streams. It is better to use unicast/TCP method because server software can implement proper frame dropping logic. Random missing packets in UDP case will just create bad user experience. like in this video: http://tinypic.com/r/2qn89xz/9

"IP multicast significantly reduces video bandwidth requirements for large audiences"

This is true for private networks, Multicast is not enabled over internet.

"Note that if TCP loses too many packets, the connection dies; thus, UDP gives you much more control for this application since UDP doesn't care about network transport layer drops."

UDP also doesn't care about dropping entire frames or group-of-frames so it does not give any more control over user experience.

"Usually a video stream is somewhat fault tolerant"

Encoded video is not fault tolerant. When transmitted over unreliable transport then forward error correction is added to video container. Good example is MPEG-TS container used in satellite video broadcast that carry several audio, video, EPG, etc. streams. This is necessary as satellite link is not duplex communication, meaning receiver can't request re-transmission of lost packets.

When you have duplex communication available it is always better to re-transmit data only to clients having packet loss then to include overhead of forward-error-correction in stream sent to all clients.

In any case lost packets are unacceptable. Dropped frames are ok in exceptional cases when bandwidth is hindered.

The result of missing packets are artifacts like this one: artifacts

Some decoders can break on streams missing packets in critical places.