更快的 s3桶复制

我一直在试图找到一个比 S3cmd更好的用于复制桶的命令行工具。s3cmd可以复制存储桶,而无需下载和上传每个文件。我通常使用 s3cmd 运行以复制 bucket 的命令是:

s3cmd cp -r --acl-public s3://bucket1 s3://bucket2

这可以工作,但是速度非常慢,因为它每次只能通过 API 复制一个文件。如果 s3cmd可以在并行模式下运行,我会非常高兴。

是否有其他选项可用作命令行工具或代码,用于复制比 s3cmd更快的存储桶?

编辑: 看来 S3cmd-修改正是我要找的。太糟糕了,它不工作。还有别的选择吗?

70579 次浏览

I don't know of any other S3 command line tools but if nothing comes up here, it might be easiest to write your own.

Pick whatever language and Amazon SDK/Toolkit you prefer. Then you just need to list/retrieve the source bucket contents and copy each file (In parallel obviously)

Looking at the source for s3cmd-modification (and I admit I know nothing about python), it looks like they have not parallelised the bucket-to-bucket code but perhaps you could use the standard upload/download parallel code as a starting point to do this.

As this is about Google's first hit on this subject, adding extra information.

'Cyno' made a newer version of s3cmd-modification, which now supports parallel bucket-to-bucket syncing. Exactly what I was waiting for as well.

Pull request is at https://github.com/pcorliss/s3cmd-modification/pull/2, his version at https://github.com/pearltrees/s3cmd-modification

If you don't mind using the AWS console, you can:

  1. Select all of the files/folders in the first bucket
  2. Click Actions > Copy
  3. Create a new bucket and select it
  4. Click Actions > Paste

It's still fairly slow, but you can leave it alone and let it do its thing.

AWS CLI seems to do the job perfectly, and has the bonus of being an officially supported tool.

aws s3 sync s3://mybucket s3://backup-mybucket

http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

Supports concurrent transfers by default. See http://docs.aws.amazon.com/cli/latest/topic/s3-config.html#max-concurrent-requests

To quickly transfer a huge number of small files, run the script from an EC2 instance to decrease latency, and increase max_concurrent_requests to reduce the impact of latency. Eg:

aws configure set default.s3.max_concurrent_requests 200

I have tried cloning two buckets using the AWS web console, the s3cmd and the AWS CLI. Although these methods works most of the time, they are painfully slow.

Then I found s3s3mirror : a specialized tool for syncing two S3 buckets. It's multi-threaded and a lot faster than the other approaches I have tried. I quickly moved Giga-bytes of data from one AWS region to another.

Check it out at https://github.com/cobbzilla/s3s3mirror, or download a Docker container from https://registry.hub.docker.com/u/pmoust/s3s3mirror/

For adhoc solution use aws cli to sync between buckets:

aws s3 sync speed depends on:
- latency for an API call to S3 endpoint
- amount of API calls made in concurrent

To increase sync speed:
- run aws s3 sync from an AWS instance (c3.large on FreeBSD is OK ;-) )
- update ~/.aws/config with:
-- max_concurrent_requests = 128
-- max_queue_size = 8096

with following config and instance type I was able to sync bucket (309GB, 72K files, us-east-1) within 474 seconds.

For more generic solution consider - AWS DataPipeLine or S3 cross-region replication.

a simple aws s3 cp s3://[original-bucket] s3://[backup-bucket] --recursive works well (assuming you have aws cli setup)

Extending deadwards answer, in 2021 copying objects from one bucket to another takes not more than 2 minutes in AWS console for 1.2 GB data.

  1. Create bucket, enter the bucket name, choose region, copy settings from existing bucket. Create bucket.
  2. Once bucket created, go to the source bucket to which you want to copy the files from.
  3. Select all (if needed or else you can choose desired files and folders), Actions > Copy.
  4. In destination, you need to browse the bucket to which the files and folders to be copied.
  5. Once click the copy button, all the files and folders are copied within a minute or two.

if you have aws console access, use AWS cloudshell and

use below command

aws s3 sync s3://mybucket s3://backup-mybucket

no need to install AWS CLI or any tools.

Command taken from the best answer above. Cloudshell will make sure that your command runs smoothly even if you lose connection and faster too since its straight aws-to-aws. no local machine in between.