使用boto3检查s3中的桶中是否存在一个键

我想知道boto3中是否存在一个键。我可以循环桶内容并检查键是否匹配。

但这似乎太长了,也太过分了。Boto3官方文档明确说明了如何做到这一点。

也许我忽略了最明显的一点。有人能告诉我怎么做吗?

301525 次浏览

Boto 2的boto.s3.key.Key对象过去有一个exists方法,通过执行HEAD请求并查看结果来检查键是否存在于S3上,但它似乎不再存在。你必须自己动手:

import boto3
import botocore


s3 = boto3.resource('s3')


try:
s3.Object('my-bucket', 'dootdoot.jpg').load()
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
# The object does not exist.
...
else:
# Something else has gone wrong.
raise
else:
# The object does exist.
...

load()对单个键执行HEAD请求,这是快速的,即使有问题的对象很大或您的bucket中有很多对象。

当然,您可能会检查对象是否存在,因为您计划使用它。如果是这种情况,你可以忘记load(),直接执行get()download_file(),然后在那里处理错误情况。

我不太喜欢在控制流中使用异常。这是在boto3中工作的另一种方法:

import boto3


s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket')
key = 'dootdoot.jpg'
objs = list(bucket.objects.filter(Prefix=key))
if any([w.key == path_s3 for w in objs]):
print("Exists!")
else:
print("Doesn't exist")

看看

bucket.get_key(
key_name,
headers=None,
version_id=None,
response_headers=None,
validate=True
)

检查桶中是否存在特定的键。这个方法 使用HEAD请求来检查密钥是否存在。返回: Key对象的实例或None

Boto S3文档

你可以调用bucket.get_key(keyname)并检查返回的对象是否为None。

在Boto3中,如果您正在检查文件夹(前缀)或使用list_objects的文件。您可以使用响应字典中的“Contents”是否存在来检查对象是否存在。这是另一种避免try/except捕获的方法,就像@EvilPuppetMaster建议的那样

import boto3
client = boto3.client('s3')
results = client.list_objects(Bucket='my-bucket', Prefix='dootdoot.jpg')
return 'Contents' in results

我发现的最简单的方法(可能也是最有效的)是:

import boto3
from botocore.errorfactory import ClientError


s3 = boto3.client('s3')
try:
s3.head_object(Bucket='bucket_name', Key='file_path')
except ClientError:
# Not found
pass

不仅是client,还有bucket:

import boto3
import botocore
bucket = boto3.resource('s3', region_name='eu-west-1').Bucket('my-bucket')


try:
bucket.Object('my-file').get()
except botocore.exceptions.ClientError as ex:
if ex.response['Error']['Code'] == 'NoSuchKey':
print('NoSuchKey')

如果你在一个目录或桶中有少于1000个,你可以获得它们的集合,然后检查这个集合中是否有这样的键:

files_in_dir = {d['Key'].split('/')[-1] for d in s3_client.list_objects_v2(
Bucket='mybucket',
Prefix='my/dir').get('Contents') or []}

即使my/dir不存在,这样的代码也可以工作。

< a href = " http://boto3.readthedocs.io/en/latest/reference/services/s3.html S3.Client。list_objects_v2" rel="nofollow noreferrer">http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_objects_v2

S3_REGION="eu-central-1"
bucket="mybucket1"
name="objectname"


import boto3
from botocore.client import Config
client = boto3.client('s3',region_name=S3_REGION,config=Config(signature_version='s3v4'))
list = client.list_objects_v2(Bucket=bucket,Prefix=name)
for obj in list.get('Contents', []):
if obj['Key'] == name: return True
return False

有一种简单的方法可以检查文件是否存在于S3桶中。我们不需要为此使用异常

sesssion = boto3.Session(aws_access_key_id, aws_secret_access_key)
s3 = session.client('s3')


object_name = 'filename'
bucket = 'bucketname'
obj_status = s3.list_objects(Bucket = bucket, Prefix = object_name)
if obj_status.get('Contents'):
print("File exists")
else:
print("File does not exists")
import boto3
client = boto3.client('s3')
s3_key = 'Your file without bucket name e.g. abc/bcd.txt'
bucket = 'your bucket name'
content = client.head_object(Bucket=bucket,Key=s3_key)
if content.get('ResponseMetadata',None) is not None:
print "File exists - s3://%s/%s " %(bucket,s3_key)
else:
print "File does not exist - s3://%s/%s " %(bucket,s3_key)

FWIW,这里是我正在使用的非常简单的函数

import boto3


def get_resource(config: dict={}):
"""Loads the s3 resource.


Expects AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to be in the environment
or in a config dictionary.
Looks in the environment first."""


s3 = boto3.resource('s3',
aws_access_key_id=os.environ.get(
"AWS_ACCESS_KEY_ID", config.get("AWS_ACCESS_KEY_ID")),
aws_secret_access_key=os.environ.get("AWS_SECRET_ACCESS_KEY", config.get("AWS_SECRET_ACCESS_KEY")))
return s3




def get_bucket(s3, s3_uri: str):
"""Get the bucket from the resource.
A thin wrapper, use with caution.


Example usage:


>> bucket = get_bucket(get_resource(), s3_uri_prod)"""
return s3.Bucket(s3_uri)




def isfile_s3(bucket, key: str) -> bool:
"""Returns T/F whether the file exists."""
objs = list(bucket.objects.filter(Prefix=key))
return len(objs) == 1 and objs[0].key == key




def isdir_s3(bucket, key: str) -> bool:
"""Returns T/F whether the directory exists."""
objs = list(bucket.objects.filter(Prefix=key))
return len(objs) > 1

对于boto3, ObjectSummary可以用来检查对象是否存在。

包含存储在Amazon S3桶中的对象的摘要。此对象不包含包含对象的完整元数据或其任何内容

import boto3
from botocore.errorfactory import ClientError
def path_exists(path, bucket_name):
"""Check to see if an object exists on S3"""
s3 = boto3.resource('s3')
try:
s3.ObjectSummary(bucket_name=bucket_name, key=path).load()
except ClientError as e:
if e.response['Error']['Code'] == "404":
return False
else:
raise e
return True


path_exists('path/to/file.html')

ObjectSummary.load

调用s3.Client。head_object更新ObjectSummary资源的属性。

这表明,如果你计划不使用get(),你可以使用ObjectSummary而不是Objectload()函数不检索对象,它只获取摘要。

如果您寻找一个与目录等效的键,那么您可能需要这种方法

session = boto3.session.Session()
resource = session.resource("s3")
bucket = resource.Bucket('mybucket')


key = 'dir-like-or-file-like-key'
objects = [o for o in bucket.objects.filter(Prefix=key).limit(1)]
has_key = len(objects) > 0

这适用于父键或等同于文件的键或不存在的键。我尝试了上面喜欢的方法,但在父键上失败了。

我注意到,为了使用botocore.exceptions.ClientError捕获异常,我们需要安装botocore。botocore占用36M的磁盘空间。如果我们使用aws lambda函数,这尤其会产生影响。如果我们只是使用异常,那么我们可以跳过使用额外的库!

  • 我正在验证文件扩展名为'.csv'
  • 如果桶不存在,这将不会抛出异常!
  • 如果桶存在但对象不存在,则不会抛出异常!
  • 如果桶为空,则抛出异常!
  • 如果桶没有权限,就会抛出异常!

代码看起来像这样。请分享你的想法:

import boto3
import traceback


def download4mS3(s3bucket, s3Path, localPath):
s3 = boto3.resource('s3')


print('Looking for the csv data file ending with .csv in bucket: ' + s3bucket + ' path: ' + s3Path)
if s3Path.endswith('.csv') and s3Path != '':
try:
s3.Bucket(s3bucket).download_file(s3Path, localPath)
except Exception as e:
print(e)
print(traceback.format_exc())
if e.response['Error']['Code'] == "404":
print("Downloading the file from: [", s3Path, "] failed")
exit(12)
else:
raise
print("Downloading the file from: [", s3Path, "] succeeded")
else:
print("csv file not found in in : [", s3Path, "]")
exit(12)

试试这个简单的方法

import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('mybucket_name') # just Bucket name
file_name = 'A/B/filename.txt'      # full file path
obj = list(bucket.objects.filter(Prefix=file_name))
if len(obj) > 0:
print("Exists")
else:
print("Not Exists")

这里有一个对我有用的解决办法。需要注意的是,我事先知道密钥的确切格式,所以我只列出单个文件

import boto3


# The s3 base class to interact with S3
class S3(object):
def __init__(self):
self.s3_client = boto3.client('s3')


def check_if_object_exists(self, s3_bucket, s3_key):
response = self.s3_client.list_objects(
Bucket = s3_bucket,
Prefix = s3_key
)
if 'ETag' in str(response):
return True
else:
return False


if __name__ == '__main__':
s3  = S3()
if s3.check_if_object_exists(bucket, key):
print "Found S3 object."
else:
print "No object found."

你可以使用S3Fs,它本质上是boto3的包装器,它公开了典型的文件系统风格操作:

import s3fs
s3 = s3fs.S3FileSystem()
s3.exists('myfile.txt')

你可以用Boto3。

import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket')
objs = list(bucket.objects.filter(Prefix=key))
if(len(objs)>0):
print("key exists!!")
else:
print("key doesn't exist!")

Here键是你要检查的路径是否存在

get()方法真的很简单

import botocore
from boto3.session import Session
session = Session(aws_access_key_id='AWS_ACCESS_KEY',
aws_secret_access_key='AWS_SECRET_ACCESS_KEY')
s3 = session.resource('s3')
bucket_s3 = s3.Bucket('bucket_name')


def not_exist(file_key):
try:
file_details = bucket_s3.Object(file_key).get()
# print(file_details) # This line prints the file details
return False
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "NoSuchKey": # or you can check with e.reponse['HTTPStatusCode'] == '404'
return True
return False # For any other error it's hard to determine whether it exists or not. so based on the requirement feel free to change it to True/ False / raise Exception


print(not_exist('hello_world.txt'))

沿着这条线索,有人能得出结论,哪一种方法是检查S3中是否存在对象的最有效方法吗?

我认为head_object可能会赢,因为它只是检查元数据,比实际对象本身更轻

假设您只是想检查一个键是否存在(而不是悄悄地覆盖它),首先进行这个检查。也会检查错误:

import boto3


def key_exists(mykey, mybucket):
s3_client = boto3.client('s3')
try:
response = s3_client.list_objects_v2(Bucket=mybucket, Prefix=mykey)
for obj in response['Contents']:
if mykey == obj['Key']:
return 'exists'
return False  # no keys match
except KeyError:
return False  # no keys found
except Exception as e:
# Handle or log other exceptions such as bucket doesn't exist
return e


key_check = key_exists('someprefix/myfile-abc123', 'my-bucket-name')
if key_check:
if key_check == 'exists':
print("key exists!")
else:
print(f"S3 ERROR: {e}")
else:
print("safe to put new bucket object")
# try:
#     resp = s3_client.put_object(Body="Your string or file-like object",
#                                 Bucket=mybucket,Key=mykey)
# ...check resp success and ClientError exception for errors...

这可以同时检查前缀和键,最多取1个键。

def prefix_exits(bucket, prefix):
s3_client = boto3.client('s3')
res = s3_client.list_objects_v2(Bucket=bucket, Prefix=prefix, MaxKeys=1)
return 'Contents' in res

使用objects.filter并检查结果列表是目前为止检查文件是否存在于S3桶中的最快方法。

使用这个简洁的联机程序,当你不得不在一个现有的项目中抛出它而不修改很多代码时,它会减少干扰。

s3_file_exists = lambda filename: bool(list(bucket.objects.filter(Prefix=filename)))

上面的函数假设bucket变量已经声明。

您可以扩展lambda以支持其他参数,例如

s3_file_exists = lambda filename, bucket: bool(list(bucket.objects.filter(Prefix=filename)))

您可以使用awswrangler在一行中完成它。

awswrangler.s3.does_object_exist(path_of_object_to_check)

https://aws-data-wrangler.readthedocs.io/en/stable/stubs/awswrangler.s3.does_object_exist.html

does_object_exist方法使用s3客户机的head_object方法并检查是否引发了ClientError。如果错误代码是404,则返回False。