在 Dynamodb 中扫描和查询的区别是什么? 什么时候使用扫描/查询？

小开

You are having dynamodb table partition key/primary key as customer_country. If you use query, customer_country is the mandatory field to make query operation. All the filters can be made only items that belongs to customer_country.

If you perform table scan the filter will be performed on all partition key/primary key. First it fetched all data and apply filter after fetching from table.

eg:

here customer_country is the partition key/primary key and id is the sort_key

-----------------------------------


customer_country | name   | id


-----------------------------------
VV               | Tom    | 1


VV               | Jack   | 2


VV               | Mary   | 4


BB               | Nancy  | 5


BB               | Lom    | 6


BB               | XX     | 7


CC               | YY     | 8


CC               | ZZ     | 9


------------------------------------

If you perform query operation it applies only on customer_country value. The value should only be equal operator (=).
So only items equal to that partition key/primary key value are fetched.
If you perform scan operation it fetches all items in that table and filter out data after it takes that data.

Note: Don't perform scan operation it exceeds your RCU.

小开

In terms of performance, I think it's good practice to design your table for applications to use Query instead of Scan. Because a scan operation always scan the entire table before it filters out the desired values, which means it takes more time and space to process data operations such as read, write and delete. For more information, please refer to the official document

小开

When creating a Dynamodb table select Primary Keys and Local Secondary Indexes (LSIs) so that a Query operation returns the items you want.

Query operations only support an equal operator evaluation of the Primary Key, but conditional (=, <, <=, >, >=, Between, Begin) on the Sort Key.

Scan operations are generally slower and more expensive as the operation has to iterate through each item in your table to get the items you are requesting.

Example:

Table: CustomerId, AccountType, Country, LastPurchase


Primary Key: CustomerId + AccountType

In this example, you can use a Query operation to get:

A CustomerId with a conditional filter on AccountType

A Scan operation would need to be used to return:

All Customers with a specific AccountType
Items based on conditional filters by Country, ie All Customers from USA
Items based on conditional filters by LastPurchase, ie All Customers that made a purchase in the last month

To avoid scan operations on frequently used operations create a Local Secondary Index (LSI) or Global Secondary Index (GSI).

Example:

Table: CustomerId, AccountType, Country, LastPurchase


Primary Key: CustomerId + AccountType
GSI: AccountType + CustomerId
LSI: CustomerId + LastPurchase

In this example a Query operation can allow you to get:

A CustomerId with a conditional filter on AccountType
[GSI] A conditional filter on CustomerIds for a specific AccountType
[LSI] A CustomerId with a conditional filter on LastPurchase

小开

Query is much better than Scan - performence wise. scan, as it's name imply, will scan the whole table. But you must be well aware of the table key, sort key, indexes and and related sort indexes in order to know that you can use the Query. if you filter your query using:

key
key & key sort
index
index and it's related sort key

use Query! otherwise use scan which is more flexible about which columns you can filter.

you can NOT Query if:

more that 2 fields in the filter (e.g. key, sort and index)
sort key only (of primary key or index)
regular fields (not key, index or sort)
mixed index and sort (index1 with sort of index2)\
...

a good explaination: https://medium.com/@amos.shahar/dynamodb-query-vs-scan-sql-syntax-and-join-tables-part-1-371288a7cb8f

小开

Its similar as in the relational database.

Get query you are using a primary key in where condition, The computation complexity is log(n) as the most of key structure is binary tree.

while scan query you have to scan whole table then apply filter on every single row to find the right result. The performance is O(n). Its much slower if your table is big.

In short, Try to use query if you know primary key. only scan for only the worst case.

Also, think about the global secondary index to support a different kind of queries on different keys to gain performance objective