最佳答案
我使用一个包含500000个条目的表在 HSQLDB
服务器上执行一些测试。该表没有索引。有5000个不同的业务密钥。我需要他们的名单。
我自然而然地从 DISTINCT
查询开始:
SELECT DISTINCT business_key
FROM memory
WHERE concept <> 'case' OR
attrib <> 'status' OR
value <> 'closed';
大概需要90秒! ! !
然后我尝试使用 GROUP BY
:
SELECT business_key
FROM memory
WHERE concept <> 'case' OR
attrib <> 'status' OR
value <> 'closed';
GROUP BY business_key
只需要一秒钟! ! !
试图找出差异,我运行 EXLAIN PLAN FOR
,但它似乎提供了相同的信息,为两个查询。
EXLAIN PLAN FOR DISTINCT ...
isAggregated=[false]
columns=[
COLUMN: PUBLIC.MEMORY.BUSINESS_KEY
]
[range variable 1
join type=INNER
table=MEMORY
alias=M
access=FULL SCAN
condition = [ index=SYS_IDX_SYS_PK_10057_10058
other condition=[
OR arg_left=[
OR arg_left=[
NOT_EQUAL arg_left=[
COLUMN: PUBLIC.MEMORY.CONCEPT] arg_right=[
VALUE = case, TYPE = CHARACTER]] arg_right=[
NOT_EQUAL arg_left=[
COLUMN: PUBLIC.MEMORY.ATTRIB] arg_right=[
VALUE = status, TYPE = CHARACTER]]] arg_right=[
NOT_EQUAL arg_left=[
COLUMN: PUBLIC.MEMORY.VALUE] arg_right=[
VALUE = closed, TYPE = CHARACTER]]]
]
]]
PARAMETERS=[]
SUBQUERIES[]
Object References
PUBLIC.MEMORY
PUBLIC.MEMORY.CONCEPT
PUBLIC.MEMORY.ATTRIB
PUBLIC.MEMORY.VALUE
PUBLIC.MEMORY.BUSINESS_KEY
Read Locks
PUBLIC.MEMORY
WriteLocks
EXLAIN PLAN FOR SELECT ... GROUP BY ...
isDistinctSelect=[false]
isGrouped=[true]
isAggregated=[false]
columns=[
COLUMN: PUBLIC.MEMORY.BUSINESS_KEY
]
[range variable 1
join type=INNER
table=MEMORY
alias=M
access=FULL SCAN
condition = [ index=SYS_IDX_SYS_PK_10057_10058
other condition=[
OR arg_left=[
OR arg_left=[
NOT_EQUAL arg_left=[
COLUMN: PUBLIC.MEMORY.CONCEPT] arg_right=[
VALUE = case, TYPE = CHARACTER]] arg_right=[
NOT_EQUAL arg_left=[
COLUMN: PUBLIC.MEMORY.ATTRIB] arg_right=[
VALUE = status, TYPE = CHARACTER]]] arg_right=[
NOT_EQUAL arg_left=[
COLUMN: PUBLIC.MEMORY.VALUE] arg_right=[
VALUE = closed, TYPE = CHARACTER]]]
]
]]
groupColumns=[
COLUMN: PUBLIC.MEMORY.BUSINESS_KEY]
PARAMETERS=[]
SUBQUERIES[]
Object References
PUBLIC.MEMORY
PUBLIC.MEMORY.CONCEPT
PUBLIC.MEMORY.ATTRIB
PUBLIC.MEMORY.VALUE
PUBLIC.MEMORY.BUSINESS_KEY
Read Locks
PUBLIC.MEMORY
WriteLocks
我做了额外的测试。HSQLDB
中有500000条记录,其中包含所有不同的业务密钥,因此 DISTINCT
的性能现在更好了——3秒,而 GROUP BY
只用了大约9秒。
在 MySQL
中,两个查询的结果相同:
MySQL: 500000行-5000个不同的业务键:
两个查询都是0.5秒
MySQL: 500000行-所有不同的业务键:
SELECT DISTINCT ...
-11秒
SELECT ... GROUP BY business_key
-13秒
所以这个问题只和 HSQLDB
有关。
如果有人能解释一下为什么会有如此巨大的差异,我将非常感激。