更快的 s3桶复制 - 开卷题库

小开

I don't know of any other S3 command line tools but if nothing comes up here, it might be easiest to write your own.

Pick whatever language and Amazon SDK/Toolkit you prefer. Then you just need to list/retrieve the source bucket contents and copy each file (In parallel obviously)

Looking at the source for s3cmd-modification (and I admit I know nothing about python), it looks like they have not parallelised the bucket-to-bucket code but perhaps you could use the standard upload/download parallel code as a starting point to do this.

小开

As this is about Google's first hit on this subject, adding extra information.

'Cyno' made a newer version of s3cmd-modification, which now supports parallel bucket-to-bucket syncing. Exactly what I was waiting for as well.

Pull request is at https://github.com/pcorliss/s3cmd-modification/pull/2, his version at https://github.com/pearltrees/s3cmd-modification

小开

If you don't mind using the AWS console, you can:

Select all of the files/folders in the first bucket
Click Actions > Copy
Create a new bucket and select it
Click Actions > Paste

It's still fairly slow, but you can leave it alone and let it do its thing.

小开

AWS CLI seems to do the job perfectly, and has the bonus of being an officially supported tool.

aws s3 sync s3://mybucket s3://backup-mybucket

http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

Supports concurrent transfers by default. See http://docs.aws.amazon.com/cli/latest/topic/s3-config.html#max-concurrent-requests

To quickly transfer a huge number of small files, run the script from an EC2 instance to decrease latency, and increase max_concurrent_requests to reduce the impact of latency. Eg:

aws configure set default.s3.max_concurrent_requests 200

小开

I have tried cloning two buckets using the AWS web console, the s3cmd and the AWS CLI. Although these methods works most of the time, they are painfully slow.

Then I found s3s3mirror : a specialized tool for syncing two S3 buckets. It's multi-threaded and a lot faster than the other approaches I have tried. I quickly moved Giga-bytes of data from one AWS region to another.

Check it out at https://github.com/cobbzilla/s3s3mirror, or download a Docker container from https://registry.hub.docker.com/u/pmoust/s3s3mirror/

小开

For adhoc solution use aws cli to sync between buckets:

aws s3 sync speed depends on:
- latency for an API call to S3 endpoint
- amount of API calls made in concurrent

To increase sync speed:
- run aws s3 sync from an AWS instance (c3.large on FreeBSD is OK ;-) )
- update ~/.aws/config with:
-- max_concurrent_requests = 128
-- max_queue_size = 8096

with following config and instance type I was able to sync bucket (309GB, 72K files, us-east-1) within 474 seconds.

For more generic solution consider - AWS DataPipeLine or S3 cross-region replication.

小开

a simple aws s3 cp s3://[original-bucket] s3://[backup-bucket] --recursive works well (assuming you have aws cli setup)

小开

Extending deadwards answer, in 2021 copying objects from one bucket to another takes not more than 2 minutes in AWS console for 1.2 GB data.

Create bucket, enter the bucket name, choose region, copy settings from existing bucket. Create bucket.
Once bucket created, go to the source bucket to which you want to copy the files from.
Select all (if needed or else you can choose desired files and folders), Actions > Copy.
In destination, you need to browse the bucket to which the files and folders to be copied.
Once click the copy button, all the files and folders are copied within a minute or two.

小开

if you have aws console access, use AWS cloudshell and

use below command

aws s3 sync s3://mybucket s3://backup-mybucket

no need to install AWS CLI or any tools.

Command taken from the best answer above. Cloudshell will make sure that your command runs smoothly even if you lose connection and faster too since its straight aws-to-aws. no local machine in between.