HATEOAS: 绝对 URL 还是相对 URL？

小开

The only real difference would seem to be that it's easier for clients if they are consuming absolute URIs instead of having to construct them from the relative version. Of course, that difference would be enough to sway me to do the absolute version.

小开

It depends on who is writing the client code. If you are writing the client and server then it doesn't make much difference. You will either suffer the pain of building the URLs on the client or on the server.

However, if you are building the server and you expect other people to write client code then they will love you much more if you provide complete URIs. Resolving relative URIs can be a bit tricky. First how you resolve them depends on the media-type returned. HTML has the base tag, XML can have xml:base tags in every nested element, Atom feeds could have a base in the feed and a different base in the content. If you don't provide your client with explicit information about the base URI then they have to get the base URI from the request URI, or maybe from the Content-Location header! And watch out for that trailing slash. The base URI is determined by ignoring all characters to the right of the last slash. This means that trailing slash is now very significant when resolving relative URIs.

The only other issue that does require a small mention is document size. If you are returning a large list of items where each item may have multiple links, using absolute URLs can add a significant amount of bytes to your entity if you do not compress the entity. This is a performance issue and you need to decide if it is significant on a case by case basis.

小开

As your application scales, you may wish to do load balancing, fail-over, etc. If you return absolute URIs then your client-side apps will follow your evolving configuration of servers.

小开

~~One drawback of using absolute URIs is that the api cannot be proxied.~~

Take it back... not true. You should go for a full URL including the domain.

小开

最佳答案

There is a subtle conceptual ambiguity when people say "relative URI".

By RFC3986's definition, a generic URI contains:

  URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]


hier-part   = "//" authority path-abempty
/ path-absolute
/ path-rootless
/ path-empty


foo://example.com:8042/over/there?name=ferret#nose
\_/   \______________/\_________/ \_________/ \__/
|           |            |            |        |
scheme     authority       path        query   fragment

The tricky thing is, when scheme and authority are omitted, the "path" part itself can be either an absolute path (starts with /) or a "rootless" relative path. Examples:

An absolute URI or a full URI: "http://example.com:8042/over/there?name=ferret"
And this is a relative uri, with absolute path: /over/there
And this is a relative uri, with relative path: here or ./here or ../here or etc.

So, if the question was "whether a server should produce relative path in restful response", the answer is "No" and the detail reason is available here. I think most people (include me) against "relative URI" are actually against "relative path".

And in practice, most server-side MVC framework can easily generate relative URI with absolute path such as /absolute/path/to/the/controller, and the question becomes "whether the server implementation should prefix a scheme://hostname:port in front of the absolute path". Like the OP's question. I am not quite sure about this one.

On the one hand, I still think server returning a full uri is recommended. However, the server should never hardcode the hostname:port thing inside source code like this (otherwise I would rather fallback to relative uri with absolute path). Solution is server-side always obtaining that prefix from HTTP request's "Host" header. Not sure whether this works for every situations though.

On the other hand, it seems not very troublesome for the client to concatenate the http://example.com:8042 and the absolute path. After all, the client already know that scheme and domain name when it send the request to the server right?

All in all, I would say, recommend to use absolute URI, possibly fallback to relative URI with absolute path, never use relative path.

小开

You should always use the full URL. It acts as the unique identifier for the resource since URLs are all required to be unique.

I would also argue that you should be consistent. Since the Location HTTP header expects a full URL based on the HTTP specification, the full URL is sent back in the Location header to the client when a new resource is created. It would be strange for you to provide a full URL in the Location header and then relative URIs in the links within your response body.

小开

An important consideration in large API results is the extra network overhead of including the full URI repeatedly. Believe it or not, gzip does not entirely solve this issue (not sure why). We were shocked at how much space the full URI took up when there were hundreds of links included in a result.

小开

Using RayLou's trichotomy my organization has opted for favoring (2). The primary reason is to avoid XSS (Cross-Site Scripting) attacks. The issue is, if an attacker can inject their own URL root into the response coming back from the server, then subsequent user requests (such as an authentication request with username and password) can be forwarded to the attacker's own server*.

Some have brought up the issue of being able to redirect requests to other servers for load balancing, but (while that is not my area of expertise) I would wager that there are better ways to enable load balancing without having to explicitly redirect clients to different hosts.

*please let me know if there any flaws in this line of reasoning. The goal, of course, is not to prevent all attacks, but at least one avenue of attack.

小开

Regarding the pros, I see the reduction in bytes to be transmitted at the expense of extra handling required by a client for the (absolute) path. If you are desperate to save every byte, even after trying content-encoding as gzip, proper use of caching headers, usage of etags and conditional requests on the client, then this may be necessary in the end, but I expect much higher returns on your efforts elsewere.

Regarding the cons, I see a loss of control regarding how you can direct the flow of clients between resources in the future (load balancing, A/B testing, ...), and I would consider it a bad practice regarding managing a web API. The URL you provide is no longer basically opaque for the client (see Tim Berners-Lee Axioms of Web Architecture on URI opacity). In the end, you become responsible to keep clients happy regarding their creative usage of your API, even if it is only regarding the structure of your URL space. Should you ever need to allow for a explicitly defined URL modification, consider the usage of URI templates as used in the Hypertext Application Language.