boto3.resource is a high-level services class wrap around boto3.client.
It is meant to attach connected resources under where you can later use other resources without specifying the original resource-id.
import boto3
s3 = boto3.resource("s3")
bucket = s3.Bucket('mybucket')
# now bucket is "attached" the S3 bucket name "mybucket"
print(bucket)
# s3.Bucket(name='mybucket')
print(dir(bucket))
#show you all class method action you may perform
OTH, boto3.client are low level, you don't have an "entry-class object", thus you must explicitly specify the exact resources it connects to for every action you perform.
It depends on individual needs. However, boto3.resource doesn't wrap all the boto3.client functionality, so sometime you need to call boto3.client , or use boto3.resource.meta.client to get the job done.
If possible use client over resource, especially if dealing with s3 object lists, and then trying to get basic information on those objects themselves.
Client calls s3 10,000/1000 = 10 times and gives you a lot of information on each object in each call..
Resource, I assume calls s3 10,000 times(or maybe same as client??), but if you take that object and try to do something with it, that is probably another call to s3, making this about 20x slower than client.
my Test reveals the following results.
s3 = boto3.resource("s3")
s3bucket = s3.Bucket(myBucket)
s3obj_list = s3bucket.objects.filter(Prefix=key_prefix)
tmp_list = [s3obj.key for s3obj in s3obj_list]
(tmp_list = [s3obj for s3obj in s3obj_list] gives same ~9min results)
When trying to get a list of 150,000 files, took ~9 minutes. If s3obj_list is indeed pulling 1000 files a call and buffering it, s3obj.key is probably not part of it and makes another call.
...some sort of loop, that also sets ContinuationToken...
response = client.list_objects_v2(
Bucket = bucket,
Prefix = prefix,
ContinuationToken=response["NextContinuationToken"],
)
...
Client took ~30 seconds to list the 150,000 files.
I don't know if resource buffers 1000 files at a time but if it doesn't that is a problem.
I also don't know if it is possible for resource to buffer the information attached to the object, but that is another problem.
I also don't know if using pagination could make client faster/easier to use.
Anyone who knows the answer to the 3 questions above please do. I'd be very interested to know.