如何删除或清除 S3上的旧文件?

是否存在删除超过 x 天的文件的现有解决方案?

66234 次浏览

亚马逊最近推出了 对象过期

Amazon S3宣布对象过期

亚马逊 S3发布了一个新的 对象过期允许您安排删除 在预定义的时间段之后使用对象。使用对象到期 安排定期清除对象,这样就不需要你了 识别要删除的对象并向 Amazon 提交删除请求 中三。

中的一组对象定义对象过期规则 每个对象到期规则允许您指定一个 前缀字段和过期时间,以天为单位。 logs/)标识受到到期规则约束的对象,并且 过期期限指定从创建日期起的天数 (即年龄) 如果超过有效期,它们将被排队删除 对象的存储费用将不会在对象的 有效期。

这里有一些关于如何做到这一点的信息..。

Http://docs.amazonwebservices.com/amazons3/latest/dev/objectexpiration.html

希望这个能帮上忙。

您可以使用 AWS S3生命周期规则来使文件过期并删除它们。您所要做的就是选择这个桶,单击“添加生命周期规则”按钮并配置它,AWS 将为您处理它们。

你可以参考下面 Joe 的博客文章,一步一步地学习,其实很简单:

Https://www.joe0.com/2017/05/24/amazon-s3-how-to-delete-files-older-than-x-days/

希望能有帮助!

可以使用下面的 Powershell 脚本删除 x days之后过期的对象。

[CmdletBinding()]
Param(
[Parameter(Mandatory=$True)]
[string]$BUCKET_NAME,             #Name of the Bucket


[Parameter(Mandatory=$True)]
[string]$OBJ_PATH,                #Key prefix of s3 object (directory path)


[Parameter(Mandatory=$True)]
[string]$EXPIRY_DAYS             #Number of days to expire
)


$CURRENT_DATE = Get-Date
$OBJECTS = Get-S3Object $BUCKET_NAME -KeyPrefix $OBJ_PATH
Foreach($OBJ in $OBJECTS){
IF($OBJ.key -ne $OBJ_PATH){
IF(($CURRENT_DATE - $OBJ.LastModified).Days -le $EXPIRY_DAYS){
Write-Host "Deleting Object= " $OBJ.key
Remove-S3Object -BucketName $BUCKET_NAME -Key $OBJ.Key -Force
}
}
}

下面是如何使用 CloudForm 模板实现它:

  JenkinsArtifactsBucket:
Type: "AWS::S3::Bucket"
Properties:
BucketName: !Sub "jenkins-artifacts"
LifecycleConfiguration:
Rules:
- Id: "remove-old-artifacts"
ExpirationInDays: 3
NoncurrentVersionExpirationInDays: 3
Status: Enabled

这创建了一个生命周期规则,如@Ravi Bhatt 所解释的

阅读更多相关内容: Https://docs.aws.amazon.com/awscloudformation/latest/userguide/aws-properties-s3-bucket-lifecycleconfig-rule.html

对象生命周期管理如何工作: Https://docs.aws.amazon.com/amazons3/latest/dev/object-lifecycle-mgmt.html

下面是一个用于删除 N 天前的文件的 Python 脚本

from boto3 import client, Session
from botocore.exceptions import ClientError
from datetime import datetime, timezone
import argparse


if __name__ == '__main__':


parser = argparse.ArgumentParser()
    

parser.add_argument('--access_key_id', required=True)
parser.add_argument('--secret_access_key', required=True)
parser.add_argument('--delete_after_retention_days', required=False, default=15)
parser.add_argument('--bucket', required=True)
parser.add_argument('--prefix', required=False, default="")
parser.add_argument('--endpoint', required=True)


args = parser.parse_args()


access_key_id = args.access_key_id
secret_access_key = args.secret_access_key
delete_after_retention_days = int(args.delete_after_retention_days)
bucket = args.bucket
prefix = args.prefix
endpoint = args.endpoint


# get current date
today = datetime.now(timezone.utc)


try:
# create a connection to Wasabi
s3_client = client(
's3',
endpoint_url=endpoint,
access_key_id=access_key_id,
secret_access_key=secret_access_key)
except Exception as e:
raise e


try:
# list all the buckets under the account
list_buckets = s3_client.list_buckets()
except ClientError:
# invalid access keys
raise Exception("Invalid Access or Secret key")


# create a paginator for all objects.
object_response_paginator = s3_client.get_paginator('list_object_versions')
if len(prefix) > 0:
operation_parameters = {'Bucket': bucket,
'Prefix': prefix}
else:
operation_parameters = {'Bucket': bucket}


# instantiate temp variables.
delete_list = []
count_current = 0
count_non_current = 0


print("$ Paginating bucket " + bucket)
for object_response_itr in object_response_paginator.paginate(**operation_parameters):
for version in object_response_itr['Versions']:
if version["IsLatest"] is True:
count_current += 1
elif version["IsLatest"] is False:
count_non_current += 1
if (today - version['LastModified']).days > delete_after_retention_days:
delete_list.append({'Key': version['Key'], 'VersionId': version['VersionId']})


# print objects count
print("-" * 20)
print("$ Before deleting objects")
print("$ current objects: " + str(count_current))
print("$ non-current objects: " + str(count_non_current))
print("-" * 20)


# delete objects 1000 at a time
print("$ Deleting objects from bucket " + bucket)
for i in range(0, len(delete_list), 1000):
response = s3_client.delete_objects(
Bucket=bucket,
Delete={
'Objects': delete_list[i:i + 1000],
'Quiet': True
}
)
print(response)


# reset counts
count_current = 0
count_non_current = 0


# paginate and recount
print("$ Paginating bucket " + bucket)
for object_response_itr in object_response_paginator.paginate(Bucket=bucket):
if 'Versions' in object_response_itr:
for version in object_response_itr['Versions']:
if version["IsLatest"] is True:
count_current += 1
elif version["IsLatest"] is False:
count_non_current += 1


# print objects count
print("-" * 20)
print("$ After deleting objects")
print("$ current objects: " + str(count_current))
print("$ non-current objects: " + str(count_non_current))
print("-" * 20)
print("$ task complete")

我是这么做的

python s3_cleanup.py --aws_access_key_id="access-key" --aws_secret_access_key="secret-key-here" --endpoint="https://s3.us-west-1.wasabisys.com" --bucket="ondemand-downloads" --prefix="" --delete_after_retention_days=5

如果只想从特定文件夹中删除文件,请使用 prefix参数