What is the mask in a WebSocket frame?

I am working on a websocket implementation and do not know what the sense of a mask is in a frame.

Could somebody explain me what it does and why it is recommend?

  0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |    Extended payload length    |
|I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
|N|V|V|V|       |S|             |   (if payload len==126/127)   |
| |1|2|3|       |K|             |                               |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
|     Extended payload length continued, if payload len == 127  |
+ - - - - - - - - - - - - - - - +-------------------------------+
|                               |Masking-key, if MASK set to 1  |
+-------------------------------+-------------------------------+
| Masking-key (continued)       |          Payload Data         |
+-------------------------------- - - - - - - - - - - - - - - - +
:                     Payload Data continued ...                :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
|                     Payload Data continued ...                |
+---------------------------------------------------------------+
36723 次浏览

Websockets are defined in RFC6455, which states in Section 5.3:

The unpredictability of the masking key is essential to prevent authors of malicious applications from selecting the bytes that appear on the wire.

In a blog entry about Websockets I found the following explanation:

masking-key (32 bits): if the mask bit is set (and trust me, it is if you write for the server side) you can read for unsigned bytes here which are used to xor the payload with. It's used to ensure that shitty proxies cannot be abused by attackers from the client side.

But the most clearly answer I found in an mailing list archive. There John Tamplin states:

Basically, WebSockets is unique in that you need to protect the network infrastructure, even if you have hostile code running in the client, full hostile control of the server, and the only piece you can trust is the client browser. By having the browser generate a random mask for each frame, the hostile client code cannot choose the byte patterns that appear on the wire and use that to attack vulnerable network infrastructure.

As kmkaplan stated, the attack vector is described in Section 10.3 of the RFC.
This is a measure to prevent proxy cache poisoning attacks1. What it does, is creating some randomness. You have to XOR the payload with the random masking-key.

By the way: It isn't just recommended. It is obligatory.

1: See Huang, Lin-Shung, et al. "Talking to yourself for fun and profit." Proceedings of W2SP (2011)

From this article:

Masking of WebSocket traffic from client to server is required because of the unlikely chance that malicious code could cause some broken proxies to do the wrong thing and use this as an attack of some kind. Nobody has proved that this could actually happen, but since the fact that it could happen was reason enough for browser vendors to get twitchy, masking was added to remove the possibility of it being used as an attack.

So assuming attackers were able to compromise both the JavaScript code executed in a browser as well as the the backend server, masking is designed to prevent the the sequence of bytes sent between these two endpoints being crafted in a special way that could disrupt any broken proxies between these two endpoints (by broken this means proxies that might attempt to interpret a websocket stream as HTTP when in fact they shouldn't).

The browser (and not the JavaScript code in the browser) has the final say on the randomly generated mask used to send the message which is why it's impossible for the attackers to know what the final stream of bytes the proxy might see will be.

Note that the mask is redundant if your WebSocket stream is encrypted (as it should be). Article from the author of Python's Flask:

Why is there masking at all? Because apparently there is enough broken infrastructure out there that lets the upgrade header go through and then handles the rest of the connection as a second HTTP request which it then stuffs into the cache. I have no words for this. In any case, the defense against that is basically a strong 32bit random number as masking key. Or you know… use TLS and don't use shitty proxies.

I have struggled to understand the purpose of the WebSocket mask until I encountered the following two resources which summarize it clearly.

From the book High Performance Browser Networking:

The payload of all client-initiated frames is masked using the value specified in the frame header: this prevents malicious scripts executing on the client from performing a cache poisoning attack against intermediaries that may not understand the WebSocket protocol.

Since the WebSocket protocol is not always understood by intermediaries (e.g. transparent proxies), a malicious script can take advantage of it and create traffic that causes cache poisoning in these intermediaries.

But how?

The article Talking to Yourself for Fun and Profit (http://www.adambarth.com/papers/2011/huang-chen-barth-rescorla-jackson.pdf) further explains how a cache poisoning attack works:

  1. The attacker’s Java applet opens a raw socket connection to attacker.com:80 (as before, the attacker can also a SWF to mount a similar attack by hosting an appropriate policy file to authorize this request).
  2. The attacker’s Java applet sends a sequence of bytes over the socket crafted with a forged Host header as follows: GET /script.js HTTP/1.1 Host: target.com
  3. The transparent proxy treats the sequence of bytes as an HTTP request and routes the request based on the original destination IP, that is to the attacker’s server.
  4. The attacker’s server replies with malicious script file with an HTTP Expires header far in the future (to instruct the proxy to cache the response for as long as possible).
  5. Because the proxy caches based on the Host header, the proxy stores the malicious script file in its cache as http://target.com/script.js, not as http://attacker.com/script.js.
  6. In the future, whenever any client requests http://target.com/script.js via the proxy, the proxy will serve the cached copy of the malicious script.

enter image description here

The article also further explains how WebSockets come into the picture in a cache-poisoning attack:

Consider an intermediary examining packets exchanged between the browser and the attacker’s server. As above, the client requests WebSockets and the server agrees. At this point, the client can send any traffic it wants on the channel. Unfortunately, the intermediary does not know about WebSockets, so the initial WebSockets handshake just looks like a standard HTTP request/response pair, with the request being terminated, as usual, by an empty line. Thus, the client program can inject new data which looks like an HTTP request and the proxy may treat it as such. So, for instance, he might inject the following sequence of bytes: GET /sensitive-document HTTP/1.1 Host: target.com

When the intermediary examines these bytes, it might conclude that these bytes represent a second HTTP request over the same socket. If the intermediary is a transparent proxy, the intermediary might route the request or cache the response according to the forged Host header.

In the above example, the malicious script took advantage of the WebSocket not being understood by the intermediary and "poisoned" its cache. Next time someone asks for sensitive-document from target.com they will receive the attacker's version of it. Imagine the scale of the attack if that document is for google-analytics.

To conclude, by forcing a mask on the payload, this poisoning won't be possible. The intermediary's cache entry will be different every time.