如何删除文件从亚马逊 s3桶?

我需要用 python 编写代码,从 Amazons3桶中删除所需的文件。我可以连接到亚马逊 s3桶,也可以保存文件,但我怎么能删除一个文件?

221894 次浏览

通过哪个接口? 使用 REST 接口,你只需要 发送删除:

DELETE /ObjectName HTTP/1.1
Host: BucketName.s3.amazonaws.com
Date: date
Content-Length: length
Authorization: signatureValue

通过 SOAP 接口:

<DeleteObject xmlns="http://doc.s3.amazonaws.com/2006-03-01">
<Bucket>quotes</Bucket>
<Key>Nelson</Key>
<AWSAccessKeyId> 1D9FVRAYCP1VJEXAMPLE=</AWSAccessKeyId>
<Timestamp>2006-03-01T12:00:00.183Z</Timestamp>
<Signature>Iuyz3d3P0aTou39dzbqaEXAMPLE=</Signature>
</DeleteObject>

如果您使用的是 Python 库 像波托一样,那么它应该公开一个“ delete”特性,比如 delete_key()

现在我已经通过使用 Linux 实用程序 S3cmd解决了这个问题:

delFile = 's3cmd -c /home/project/.s3cfg del s3://images/anon-images/small/' + filename
os.system(delFile)

找到了另一种方法:

from boto.s3.connection import S3Connection, Bucket, Key


conn = S3Connection(AWS_ACCESS_KEY, AWS_SECERET_KEY)


b = Bucket(conn, S3_BUCKET_NAME)


k = Key(b)


k.key = 'images/my-images/'+filename


b.delete_key(k)

我很惊讶没有这么简单的方法:

from boto.s3.connection import S3Connection, Bucket, Key


conn = S3Connection(AWS_ACCESS_KEY, AWS_SECERET_KEY)
bucket = Bucket(conn, S3_BUCKET_NAME)
k = Key(bucket = bucket, name=path_to_file)
k.delete()

使用 Python boto3 SDK(并假设为 AWS 设置了凭据) ,下面将删除 bucket 中的指定对象:

import boto3


client = boto3.client('s3')
client.delete_object(Bucket='mybucketname', Key='myfile.whatever')

使用 boto3(当前版本1.4.4)使用 S3.Object.delete()

import boto3


s3 = boto3.resource('s3')
s3.Object('your-bucket', 'your-key').delete()

这招对我很管用,试试看。

import boto
import sys
from boto.s3.key import Key
import boto.s3.connection


AWS_ACCESS_KEY_ID = '<access_key>'
AWS_SECRET_ACCESS_KEY = '<secret_access_key>'
Bucketname = 'bucket_name'


conn = boto.s3.connect_to_region('us-east-2',
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
is_secure=True,
calling_format = boto.s3.connection.OrdinaryCallingFormat(),
)
bucket = conn.get_bucket(Bucketname)


k = Key(bucket)


k.key = 'filename to delete'
bucket.delete_key(k)

您可以使用 awscli: https://aws.amazon.com/cli/和一些 unix 命令来完成。

这个法则 cli 命令应该起作用:

aws s3 rm s3://<your_bucket_name> --exclude "*" --include "<your_regex>"

如果要包含子文件夹,则应添加标志 递归

或使用 unix 命令:

aws s3 ls s3://<your_bucket_name>/ | awk '{print $4}' | xargs -I%  <your_os_shell>   -c 'aws s3 rm s3:// <your_bucket_name>  /% $1'

解释:

  1. 列出桶中的所有文件
  2. 获取第4个参数(它的文件名) ——管道—— >//,您可以用 linux 命令替换它来匹配您的模式
  3. 使用 awscli 运行删除脚本

尝试寻找一个 更新的方法,因为 Boto3可能会随时变化。我使用 My _ bucket. delete _ Objects () :

import boto3
from boto3.session import Session


session = Session(aws_access_key_id='your_key_id',
aws_secret_access_key='your_secret_key')


# s3_client = session.client('s3')
s3_resource = session.resource('s3')
my_bucket = s3_resource.Bucket("your_bucket_name")


response = my_bucket.delete_objects(
Delete={
'Objects': [
{
'Key': "your_file_name_key"   # the_name of_your_file
}
]
}
)


如果您试图使用您自己的本地主机控制台删除文件,那么您可以尝试运行这个 python 脚本,假设您已经在系统中分配了您的访问 ID 和密钥

import boto3


#my custom sesssion
aws_m=boto3.session.Session(profile_name="your-profile-name-on-local-host")
client=aws_m.client('s3')


#list bucket objects before deleting
response = client.list_objects(
Bucket='your-bucket-name'
)
for x in response.get("Contents", None):
print(x.get("Key",None));


#delete bucket objects
response = client.delete_object(
Bucket='your-bucket-name',
Key='mydocs.txt'
)


#list bucket objects after deleting
response = client.list_objects(
Bucket='your-bucket-name'
)
for x in response.get("Contents", None):
print(x.get("Key",None));

欢迎来到2020年这里是 Python/Django 的答案:

from django.conf import settings
import boto3
s3 = boto3.client('s3')
s3.delete_object(Bucket=settings.AWS_STORAGE_BUCKET_NAME, Key=f"media/{item.file.name}")

我花了很长时间才找到答案,就这么简单。

下面的代码对我很有用(基于 Django 模型的示例,但是您可以自己使用 delete方法的代码)。

import boto3
from boto3.session import Session
from django.conf import settings


class Video(models.Model):
title=models.CharField(max_length=500)
description=models.TextField(default="")
creation_date=models.DateTimeField(default=timezone.now)
videofile=models.FileField(upload_to='videos/', null=True, verbose_name="")
tags = TaggableManager()


actions = ['delete']


def __str__(self):
return self.title + ": " + str(self.videofile)


def delete(self, *args, **kwargs):
session = Session (settings.AWS_ACCESS_KEY_ID, settings.AWS_SECRET_ACCESS_KEY)
s3_resource = session.resource('s3')
s3_bucket = s3_resource.Bucket(settings.AWS_STORAGE_BUCKET_NAME)


file_path = "media/" + str(self.videofile)
response = s3_bucket.delete_objects(
Delete={
'Objects': [
{
'Key': file_path
}
]
})
super(Video, self).delete(*args, **kwargs)

最简单的方法是:

import boto3
s3 = boto3.resource("s3")
bucket_source = {
'Bucket': "my-bcuket",
'Key': "file_path_in_bucket"
}
s3.meta.client.delete(bucket_source)

下面是可以用来删除 bucket 的代码片段,

import boto3, botocore
from botocore.exceptions import ClientError


s3 = boto3.resource("s3",aws_access_key_id='Your-Access-Key',aws_secret_access_key='Your-Secret-Key')
s3.Object('Bucket-Name', 'file-name as key').delete()

请试试这个代码

import boto3
s3 = boto3.client('s3')
s3.delete_object(Bucket="s3bucketname", Key="s3filepath")

使用 s3fs中的 S3FileSystem.rm函数。

您可以一次删除一个或多个文件:

import s3fs
file_system = s3fs.S3FileSystem()


file_system.rm('s3://my-bucket/foo.txt')  # single file


files = ['s3://my-bucket/bar.txt', 's3://my-bucket/baz.txt']
file_system.rm(files)  # several files

如果您想用最简单的方法删除 s3 bucket 中的所有文件,只需要几行代码就可以了。

import boto3


s3 = boto3.resource('s3', aws_access_key_id='XXX', aws_secret_access_key= 'XXX')
bucket = s3.Bucket('your_bucket_name')
bucket.objects.delete()

2021年更新-我有一个艰难的时间,但它是一样简单的做法。

  def delete_object(self,request):
s3 = boto3.resource('s3',
aws_access_key_id=AWS_UPLOAD_ACCESS_KEY_ID,
aws_secret_access_key= AWS_UPLOAD_SECRET_KEY,
)
s3.Object('your-bucket', 'your-key}').delete()

确保在 boto3资源中添加凭据

我是这么做的

"""
This is module which contains all classes related to aws S3
"""
"""
awshelper.py
-------


This module contains the AWS class


"""


try:


import re
import os
import json
import boto3
import datetime
import uuid
import math
from boto3.s3.transfer import TransferConfig
import threading
import sys


from tqdm import tqdm
except Exception as e:
print("Error : {} ".format(e))


DATA = {
"AWS_ACCESS_KEY": "XXXXXXXXXXXX",
"AWS_SECRET_KEY": "XXXXXXXXXXXXX",
"AWS_REGION_NAME": "us-east-1",
"BUCKET": "XXXXXXXXXXXXXXXXXXXX",
}


for key, value in DATA.items():os.environ[key] = str(value)


class Size:
@staticmethod
def convert_size(size_bytes):


if size_bytes == 0:
return "0B"
size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
i = int(math.floor(math.log(size_bytes, 1024)))
p = math.pow(1024, i)
s = round(size_bytes / p, 2)
return "%s %s" % (s, size_name[i])


class ProgressPercentage(object):
def __init__(self, filename, filesize):
self._filename = filename
self._size = filesize
self._seen_so_far = 0
self._lock = threading.Lock()


def __call__(self, bytes_amount):
def convertSize(size):
if (size == 0):
return '0B'
size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
i = int(math.floor(math.log(size,1024)))
p = math.pow(1024,i)
s = round(size/p,2)
return '%.2f %s' % (s,size_name[i])


# To simplify, assume this is hooked up to a single filename
with self._lock:
self._seen_so_far += bytes_amount
percentage = (self._seen_so_far / self._size) * 100
sys.stdout.write(
"\r%s  %s / %s  (%.2f%%)        " % (
self._filename, convertSize(self._seen_so_far), convertSize(self._size),
percentage))
sys.stdout.flush()


class ProgressPercentageUpload(object):


def __init__(self, filename):
self._filename = filename
self._size = float(os.path.getsize(filename))
self._seen_so_far = 0
self._lock = threading.Lock()


def __call__(self, bytes_amount):
# To simplify, assume this is hooked up to a single filename
with self._lock:
self._seen_so_far += bytes_amount
percentage = (self._seen_so_far / self._size) * 100
sys.stdout.write(
"\r%s  %s / %s  (%.2f%%)" % (
self._filename, self._seen_so_far, self._size,
percentage))
sys.stdout.flush()


class AWSS3(object):


"""Helper class to which add functionality on top of boto3 """


def __init__(self, bucket, aws_access_key_id, aws_secret_access_key, region_name):


self.BucketName = bucket
self.client = boto3.client(
"s3",
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
region_name=region_name,
)


def get_length(self, Key):
response = self.client.head_object(Bucket=self.BucketName, Key=Key)
size = response["ContentLength"]
return {"bytes": size, "size": Size.convert_size(size)}


def put_files(self, Response=None, Key=None):
"""
Put the File on S3
:return: Bool
"""
try:


response = self.client.put_object(
ACL="private", Body=Response, Bucket=self.BucketName, Key=Key
)
return "ok"
except Exception as e:
print("Error : {} ".format(e))
return "error"


def item_exists(self, Key):
"""Given key check if the items exists on AWS S3 """
try:
response_new = self.client.get_object(Bucket=self.BucketName, Key=str(Key))
return True
except Exception as e:
return False


def get_item(self, Key):


"""Gets the Bytes Data from AWS S3 """


try:
response_new = self.client.get_object(Bucket=self.BucketName, Key=str(Key))
return response_new["Body"].read()


except Exception as e:
print("Error :{}".format(e))
return False


def find_one_update(self, data=None, key=None):


"""
This checks if Key is on S3 if it is return the data from s3
else store on s3 and return it
"""


flag = self.item_exists(Key=key)


if flag:
data = self.get_item(Key=key)
return data


else:
self.put_files(Key=key, Response=data)
return data


def delete_object(self, Key):


response = self.client.delete_object(Bucket=self.BucketName, Key=Key,)
return response


def get_all_keys(self, Prefix="", max_page_number=100):


"""
:param Prefix: Prefix string
:return: Keys List
"""
try:
paginator = self.client.get_paginator("list_objects_v2")
pages = paginator.paginate(Bucket=self.BucketName, Prefix=Prefix)


tmp = []


for page_no, page in enumerate(pages):
if page_no >max_page_number:break
print("page_no : {}".format(page_no))
for obj in page["Contents"]:
tmp.append(obj["Key"])


return tmp
except Exception as e:
return []


def print_tree(self):
keys = self.get_all_keys()
for key in keys:
print(key)
return None


def find_one_similar_key(self, searchTerm=""):
keys = self.get_all_keys()
return [key for key in keys if re.search(searchTerm, key)]


def __repr__(self):
return "AWS S3 Helper class "


def download_file_locally(self, key, filename):
try:
response = self.client.download_file(
Bucket=self.BucketName,
Filename=filename,
Key=key,
Callback=ProgressPercentage(filename,
(self.client.head_object(Bucket=self.BucketName,
Key=key))["ContentLength"]),
Config=TransferConfig(
max_concurrency=10,
use_threads=True,
)
)
return True
except Exception as e:
print("Error Download file : {}".format(e))
return False


def upload_files_from_local(self, file_name, key):


try:


response = self.client.upload_file(
Filename=file_name,
Bucket=self.BucketName ,
Key = key,
Callback=ProgressPercentageUpload(file_name),
Config=TransferConfig(
max_concurrency=10,
use_threads=True,
))
return True
except Exception as e:
print("Error upload : {} ".format(e))
return False




def batch_objects_delete_threadded(batch_size=50, max_page_size=100):
helper_qa = AWSS3(
aws_access_key_id=os.getenv("AWS_ACCESS_KEY"),
aws_secret_access_key=os.getenv("XXXXXXXXXXXXX"),
region_name=os.getenv("AWS_REGION_NAME"),
bucket=os.getenv("BUCKET"),
)


keys = helper_qa.get_all_keys(Prefix="database=XXXXXXXXXXX/", max_page_number=max_page_size)
MainThreads = [threading.Thread(target=helper_qa.delete_object, args=(key, )) for key in keys]


print("Length: keys : {} ".format(len(keys)))
for thread in tqdm(range(0, len(MainThreads), batch_size)):
for t in MainThreads[thread: thread + batch_size]:t.start()
for t in MainThreads[thread: thread + batch_size] : t.join()


# ==========================================
start = datetime.datetime.now()
batch_objects_delete_threadded()
end = datetime.datetime.now()
print("Execution Time : {} ".format(end-start))
# ==========================================








cloudpathlib boto3封装在 pathlib接口中,因此执行这样的任务就像处理本地文件一样容易。

首先,确保通过设置 ~/.aws/credentials文件或环境变量正确地进行身份验证。

然后,像在 pathlib 中一样使用 unlink方法。

from cloudpathlib import CloudPath


# create a few files to work with
cl1 = CloudPath("s3://test-bucket/so/test_dir/f1.txt")
cl2 = CloudPath("s3://test-bucket/so/test_dir/f2.txt")
cl3 = CloudPath("s3://test-bucket/so/test_dir/f3.txt")


# write content to these files
cl1.write_text("hello file 1")
cl2.write_text("hello file 2")
cl3.write_text("hello file 3")


# show these file exist on S3
list(CloudPath("s3://test-bucket/so/test_dir/").iterdir())
#> [ S3Path('s3://test-bucket/so/test_dir/f1.txt'),
#>   S3Path('s3://test-bucket/so/test_dir/f2.txt'),
#>   S3Path('s3://test-bucket/so/test_dir/f3.txt')]


# remove a single file with `unlink`
cl1.unlink()


list(CloudPath("s3://test-bucket/so/test_dir/").iterdir())
#> [ S3Path('s3://test-bucket/so/test_dir/f2.txt'),
#>   S3Path('s3://test-bucket/so/test_dir/f3.txt')]


# remove a directory with `rmtree`
CloudPath("s3://test-bucket/so/test_dir/").rmtree()


# no more files
list(CloudPath("s3://test-bucket/so/").iterdir())
#> []

从 S3中的文件夹中删除文件

client = boto3.client('s3')
response = client.list_objects(
Bucket='bucket_name',
Prefix='folder_name/'
)
obj_list = []
for data in response.get('Contents', []):
print('res', data.get('Key'))
obj_list.append({'Key': data.get('Key')})
if obj_list:
response = client.delete_objects(
Bucket='buket_name',
Delete={'Objects': obj_list}
)
print('response', response)