Nginx 代理 Amazon S3资源

我正在执行一些 WPO 任务,所以 PageSpeed 建议我利用浏览器缓存。对于 Nginx 服务器中的一些静态文件,我已经成功地改进了它,但是存储在 Amazon S3服务器中的图像文件仍然丢失。

我读过一个方法,关于更新每个文件在 S3中包括一些头元标签(过期和缓存控制)。我觉得这不是个好办法。我有成千上万的文件,所以这对我来说不可行。

我认为最方便的方法是配置我的 Nginx 1.6.0服务器来代理 S3文件。我已经读过这方面的文章,但是我对服务器配置一窍不通,所以我从这些网站得到了几个例子: https://gist.github.com/benjaminbarbe/1961db5ffbaad57eff12

我在 nginx 配置文件中的服务器块中添加了这个位置代码:

#inside server block
location /mybucket.s3.amazonaws.com/ {




proxy_http_version     1.1;
proxy_set_header       Host mybucket.s3.amazonaws.com;
proxy_set_header       Authorization '';
proxy_hide_header      x-amz-id-2;
proxy_hide_header      x-amz-request-id;
proxy_hide_header      Set-Cookie;
proxy_ignore_headers   "Set-Cookie";
proxy_buffering        off;
proxy_intercept_errors on;
proxy_pass             http://mybucket.s3.amazonaws.com;
}

可以肯定的是,这对我不起作用。请求中没有包含标头。首先,我认为请求与位置不匹配。

Accept-Ranges:bytes
Content-Length:90810
Content-Type:image/jpeg
Date:Fri, 23 Jun 2017 04:53:56 GMT
ETag:"4fd0be549fbcaf9b47c18a15146cdf16"
Last-Modified:Tue, 09 Jun 2015 09:47:13 GMT
Server:AmazonS3
x-amz-id-2:cKsq1qRra74DqVsTewh3P3sgzVUoPR8aAT2NFCuwA+JjCdDZfk7/7x/C0WPjBa51GEb4C8LyAIc=
x-amz-request-id:94EADB4EDD3DE1C1
77607 次浏览

Without the details of which modules Nginx is compiled with, we can say two ways for adding Expires and Cache-Control headers to all files.

Nginx S3 proxy

This is what you asked about -- using Nginx to add expire, cache-control headers on S3 files.

Nginx this set-misc-nginx-module needed to support Nginx S3 proxy & change/add expire, cache-control on the fly. This is a standard full guide from compilation to usage, this is great guide for nginx-extras for Ubuntu server. This is full guide with example with WordPress.

There are more S3 modules for extra things. Without those modules Nginx will not understand and config test (nginx -t) will pass test with wrong config. set-misc-nginx-module is minimum for your need. What you want has better example on this Github gist.

As not all are used with compilation and the setup is really slightly difficult, I am also writing the way to set Expires and Cache-Control header for all files in one Amazon S3 bucket.

Amazon S3 Bucket Expires and Cache-Control Header

Also, it is possible to set Expires and Cache-Control headers for all objects in one AWS S3 bucket with script or command line. There are several such free libraries and scripts on Github like this one, bucket explorer, Amazon's tool, Amazon's this doc and this doc. Command will be like this for that cp CLI tool :

aws s3 cp s3://mybucket/ s3://mybucket/ --recursive --metadata-directive REPLACE \
--expires 2027-09-01T00:00:00Z --acl public-read --cache-control max-age=2000000,public

From an architectural review, what you're trying to do is a wrong way to go about:

  • Amazon S3 is presumably optimised to be a highly available cache; by introducing a hand-rolled proxying layer on top of it, you're merely introducing an unnecessary extra delay and a huge point of failure, and also losing all the benefits that would come out of S3

  • Your performance analysis with regards to the number of files is incorrect. If you have thousands of files on S3, the correct solution would be to write a one-time script to change the requisite attributes on S3, instead of hand-rolling a proxying mechanism that you don't fully understand, and that would be executed many times over (ad nauseam). Doing the proxying would likely be a band-aid, and, in reality, will likely decrease the performance, not increase it (even if you'd get to have a stateless automated tool tell you otherwise). Not to mention that it would also be an unnecessary resource drain, and may contribute to actual performance issues and heisenbugs down the line.


That said, if you're still up for proxying with adding the headers, the correct way to do so with nginx would be by using the expires directive.

E.g., you may place expires max; before or after your proxy_pass directive within the appropriate location.

The expires directive automatically takes care of setting a correct Cache-Control header for you, too; but you could also use add_header directive should you wish to add any custom response headers manually.

Your approach to proxy S3 files via Nginx makes a lot of sense. It solves number of problems and comes with extra benefits such masking URLs, proxy cache, speed up transferring by offload SSL/TLS. You do it almost right, let me show what is left to make it perfect.

For sample queries I use the S3 bucket and an image URL mentioned in the public comment to the original question.

We start with inspecting of Amazon S3 files' headers

curl -I http://yanpy.dev.s3.amazonaws.com/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg


HTTP/1.1 200 OK
Date: Sun, 25 Jun 2017 17:49:10 GMT
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Accept-Ranges: bytes
Content-Type: binary/octet-stream
Content-Length: 378843
Server: AmazonS3

We can see missing Cache-Control but Conditional GET headers have already been configured. When we reuse E-Tag/Last-Modified (that's how a browser's client side cache works), we get HTTP 304 alongside with empty Content-Length. An interpretation of that is client (curl in our case) queries the resource saying that no data transfer required unless file has been modified on the server:

curl -I http://yanpy.dev.s3.amazonaws.com/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg
--header "If-None-Match: 37a907fc5dd7cfd0c428af78f09e95a9"


HTTP/1.1 304 Not Modified
Date: Sun, 25 Jun 2017 17:53:33 GMT
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Server: AmazonS3


curl -I http://yanpy.dev.s3.amazonaws.com/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg
--header "If-Modified-Since: Wed, 21 Jun 2017 07:42:31 GMT"


HTTP/1.1 304 Not Modified
Date: Sun, 25 Jun 2017 18:17:34 GMT
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Server: AmazonS3

"PageSpeed suggested to leverage browser caching" that means Cache=control is missing. Nginx as proxy for S3 files solves not only problem with missing headers but also saves traffic using Nginx proxy cache.

I use macOS but Nginx configuration works on Linux exactly the same way without modifications. Step by step:

1.Install Nginx

brew update && brew install nginx

2.Setup Nginx to proxy S3 bucket, see configuration below

3.Request the file via Nginx. Please take a look at the Server header, we see Nginx rather than Amazon S3 now:

curl -I http://localhost:8080/s3/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg


HTTP/1.1 200 OK
Server: nginx/1.12.0
Date: Sun, 25 Jun 2017 18:30:26 GMT
Content-Type: binary/octet-stream
Content-Length: 378843
Connection: keep-alive
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Accept-Ranges: bytes
Cache-Control: max-age=31536000

Request the file via Nginx

4.Request the file using Nginx proxy with Conditional GET:

curl -I http://localhost:8080/s3/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg
--header "If-None-Match: 37a907fc5dd7cfd0c428af78f09e95a9"


HTTP/1.1 304 Not Modified
Server: nginx/1.12.0
Date: Sun, 25 Jun 2017 18:32:16 GMT
Connection: keep-alive
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Cache-Control: max-age=31536000

Request the file using Nginx proxy with Conditional GET

5.Request the file using Nginx proxy cache, please take a look at X-Cache-Status header, its value is MISS until cache warmed up after first request

curl -I http://localhost:8080/s3_cached/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg
HTTP/1.1 200 OK
Server: nginx/1.12.0
Date: Sun, 25 Jun 2017 18:40:45 GMT
Content-Type: binary/octet-stream
Content-Length: 378843
Connection: keep-alive
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Cache-Control: max-age=31536000
X-Cache-Status: HIT
Accept-Ranges: bytes

Request the file using Nginx proxy cache

Based on Nginx official documentation I provide the Nginx S3 configuration with optimised caching settings that supports the following options:

  • proxy_cache_revalidate instructs NGINX to use conditional GET requests when refreshing content from the origin servers
  • the updating parameter to the proxy_cache_use_stale directive instructs NGINX to deliver stale content when clients request an item while an update to it is being downloaded from the origin server, instead of forwarding repeated requests to the server
  • with proxy_cache_lock enabled, if multiple clients request a file that is not current in the cache (a MISS), only the first of those requests is allowed through to the origin server

Nginx configuration:

worker_processes  1;
daemon off;


error_log  /dev/stdout info;
pid        /usr/local/var/nginx/nginx.pid;




events {
worker_connections  1024;
}




http {
default_type       text/html;
access_log         /dev/stdout;
sendfile           on;
keepalive_timeout  65;


proxy_cache_path   /tmp/ levels=1:2 keys_zone=s3_cache:10m max_size=500m
inactive=60m use_temp_path=off;


server {
listen 8080;


location /s3/ {
proxy_http_version     1.1;
proxy_set_header       Connection "";
proxy_set_header       Authorization '';
proxy_set_header       Host yanpy.dev.s3.amazonaws.com;
proxy_hide_header      x-amz-id-2;
proxy_hide_header      x-amz-request-id;
proxy_hide_header      x-amz-meta-server-side-encryption;
proxy_hide_header      x-amz-server-side-encryption;
proxy_hide_header      Set-Cookie;
proxy_ignore_headers   Set-Cookie;
proxy_intercept_errors on;
add_header             Cache-Control max-age=31536000;
proxy_pass             http://yanpy.dev.s3.amazonaws.com/;
}


location /s3_cached/ {
proxy_cache            s3_cache;
proxy_http_version     1.1;
proxy_set_header       Connection "";
proxy_set_header       Authorization '';
proxy_set_header       Host yanpy.dev.s3.amazonaws.com;
proxy_hide_header      x-amz-id-2;
proxy_hide_header      x-amz-request-id;
proxy_hide_header      x-amz-meta-server-side-encryption;
proxy_hide_header      x-amz-server-side-encryption;
proxy_hide_header      Set-Cookie;
proxy_ignore_headers   Set-Cookie;
proxy_cache_revalidate on;
proxy_intercept_errors on;
proxy_cache_use_stale  error timeout updating http_500 http_502 http_503 http_504;
proxy_cache_lock       on;
proxy_cache_valid      200 304 60m;
add_header             Cache-Control max-age=31536000;
add_header             X-Cache-Status $upstream_cache_status;
proxy_pass             http://yanpy.dev.s3.amazonaws.com/;
}


}
}