S3fs 将 Amazon S3 bucket 挂载为本地目录的稳定性如何

S3fs 在 Linux 中将 Amazon S3 bucket 挂载为本地目录的稳定性如何?它是否适用于高需求的生产环境?

有没有更好的/类似的解决方案?

更新: 使用 EBS 并通过 NFS 将其挂载到所有其他 AMI 是否更好?

80878 次浏览

There's a good article on s3fs here, which after reading I resorted to an EBS Share.

It highlights a few important considerations when using s3fs, namely related to the inherent limitations of S3:

  • no file can be over 5GB
  • you can't partially update a file so changing a single byte will re-upload the entire file.
  • operation on many small files are very efficient (each is a separate S3 object after all) but large files are very inefficient
  • Though S3 supports partial/chunked downloads, s3fs doesn't take advantage of this so if you want to read just one byte of a 1GB file, you'll have to download the entire GB.

It therefore depends on what you are storing whether s3fs is a feasible option. If you're storing say, photos, where you want to write an entire file or read an entire file never incrementally change a file, then its fine, although one may ask, if you're doing this, then why not just use S3's API Directly?

If you're talking about appliation data, (say database files, logging files) where you want to make small incremental change then its a definite no - S3 Just doesn't work that way you can't incrementally change a file.

The article mentioned above does talk about a similar application - s3backer - which gets around the performance issues by implementing a virtual filesystem over S3. This gets around the performance issues but itself has a few issues of its own:

  • High risk for data corruption, due to the delayed writes
  • too small block sizes (e.g., the 4K default) can add significant extra costs (e.g., $130 for 50GB with 4K blocks worth of storage)
  • too large block sizes can add significant data transfer and storage fees.
  • memory usage can be prohibitive: by default it caches 1000 blocks.
    With the default 4K block size that's not an issue but most users
    will probably want to increase block size.

I resorted to EBS Mounted Drived shared from an EC2 instance. But you should know that although the most performant option it has one big problem An EBS Mounted NFS Share has its own problems - a single point of failure; if the machine that's sharing the EBS Volume goes down then you lose access on all machines which access the share.

This is a risk I was able to live with and was the option I chose in the end. I hope this helps.

This is an old question so I'll share my experience over the past year with S3FS.

Initially, it had a number of bugs and memory leaks (I had a cron-job to restart it every 2 hours) but with the latest release 1.73 it's been very stable.

The best thing about S3FS is you have one less things to worry about and get some performance benefits for free.

Most of your S3 requests are going to be PUT (~5%) and GET (~95%). If you don't need any post-processing (thumbnail generation for example). If you don't need any post-processing, you shouldn't be hitting your web server in the first place and uploading directly to S3 (using CORS).

Assuming you are hitting the server probably means you need to do some post-processing on images. With an S3 API you'll be uploading to the server, then uploading to S3. If the user wants to crop, you'll need to download again from S3, then re-upload to server, crop and then upload to S3. With S3FS and local caching turned on this orchestration is taken care of for you and saves downloading files from S3.

On caching, if you are caching to an ephemeral drive on EC2, you get a the performance benefits that come with out and can purge your cache without having to worry about anything. Unless you run out of disk space, you should have no reason to purge your cache. This makes traversing operations like searching and filtering much easier.

The one thing I do wish it has was full sync with S3 (RSync style). That would make it an enterprise version of DropBox or Google Drive for S3 but without having to contend with the quotas and fees that come with it.