在 Dynamodb 中扫描和查询的区别是什么? 什么时候使用扫描/查询?

DynamoDB 文档中指定的查询操作:

查询操作只搜索主键属性值,并支持关键属性值上的一个比较运算符子集来细化搜索过程。

和扫描操作:

扫描操作扫描整个表。您可以指定应用于结果的筛选器,以便在完成扫描后精炼返回给您的值。

根据性能和成本,哪一个是最好的?

71055 次浏览

You are having dynamodb table partition key/primary key as customer_country. If you use query, customer_country is the mandatory field to make query operation. All the filters can be made only items that belongs to customer_country.

If you perform table scan the filter will be performed on all partition key/primary key. First it fetched all data and apply filter after fetching from table.

eg:

here customer_country is the partition key/primary key and id is the sort_key

-----------------------------------


customer_country | name   | id


-----------------------------------
VV               | Tom    | 1


VV               | Jack   | 2


VV               | Mary   | 4


BB               | Nancy  | 5


BB               | Lom    | 6


BB               | XX     | 7


CC               | YY     | 8


CC               | ZZ     | 9


------------------------------------
  • If you perform query operation it applies only on customer_country value. The value should only be equal operator (=).

  • So only items equal to that partition key/primary key value are fetched.

  • If you perform scan operation it fetches all items in that table and filter out data after it takes that data.

Note: Don't perform scan operation it exceeds your RCU.

In terms of performance, I think it's good practice to design your table for applications to use Query instead of Scan. Because a scan operation always scan the entire table before it filters out the desired values, which means it takes more time and space to process data operations such as read, write and delete. For more information, please refer to the official document

When creating a Dynamodb table select Primary Keys and Local Secondary Indexes (LSIs) so that a Query operation returns the items you want.

Query operations only support an equal operator evaluation of the Primary Key, but conditional (=, <, <=, >, >=, Between, Begin) on the Sort Key.

Scan operations are generally slower and more expensive as the operation has to iterate through each item in your table to get the items you are requesting.

Example:

Table: CustomerId, AccountType, Country, LastPurchase


Primary Key: CustomerId + AccountType

In this example, you can use a Query operation to get:

  1. A CustomerId with a conditional filter on AccountType

A Scan operation would need to be used to return:

  1. All Customers with a specific AccountType
  2. Items based on conditional filters by Country, ie All Customers from USA
  3. Items based on conditional filters by LastPurchase, ie All Customers that made a purchase in the last month

To avoid scan operations on frequently used operations create a Local Secondary Index (LSI) or Global Secondary Index (GSI).

Example:

Table: CustomerId, AccountType, Country, LastPurchase


Primary Key: CustomerId + AccountType
GSI: AccountType + CustomerId
LSI: CustomerId + LastPurchase

In this example a Query operation can allow you to get:

  1. A CustomerId with a conditional filter on AccountType
  2. [GSI] A conditional filter on CustomerIds for a specific AccountType
  3. [LSI] A CustomerId with a conditional filter on LastPurchase

Query is much better than Scan - performence wise. scan, as it's name imply, will scan the whole table. But you must be well aware of the table key, sort key, indexes and and related sort indexes in order to know that you can use the Query. if you filter your query using:

  • key
  • key & key sort
  • index
  • index and it's related sort key

use Query! otherwise use scan which is more flexible about which columns you can filter.

you can NOT Query if:

  • more that 2 fields in the filter (e.g. key, sort and index)
  • sort key only (of primary key or index)
  • regular fields (not key, index or sort)
  • mixed index and sort (index1 with sort of index2)\
  • ...

a good explaination: https://medium.com/@amos.shahar/dynamodb-query-vs-scan-sql-syntax-and-join-tables-part-1-371288a7cb8f

Its similar as in the relational database.

Get query you are using a primary key in where condition, The computation complexity is log(n) as the most of key structure is binary tree.

while scan query you have to scan whole table then apply filter on every single row to find the right result. The performance is O(n). Its much slower if your table is big.

In short, Try to use query if you know primary key. only scan for only the worst case.

Also, think about the global secondary index to support a different kind of queries on different keys to gain performance objective