哪个 SQL 查询更快? 过滤连接条件还是 Where 子句?

比较这两个查询。将筛选器放在连接条件或 WHERE子句中更快吗。我一直觉得它在连接标准上更快,因为它在最短的时间内减少了结果集,但我不确定。

我将构建一些测试来查看,但是我也想得到一些关于哪些内容更容易阅读的意见。

问题1

SELECT      *
FROM        TableA a
INNER JOIN  TableXRef x
ON  a.ID = x.TableAID
INNER JOIN  TableB b
ON  x.TableBID = b.ID
WHERE       a.ID = 1            /* <-- Filter here? */

问题2

SELECT      *
FROM        TableA a
INNER JOIN  TableXRef x
ON  a.ID = x.TableAID
AND a.ID = 1            /* <-- Or filter here? */
INNER JOIN  TableB b
ON  x.TableBID = b.ID

剪辑

我运行了一些测试,结果显示它实际上非常接近,但是 WHERE子句实际上稍微快一些!=)

我完全同意在 WHERE子句上应用过滤器更有意义,我只是对性能影响感到好奇。

标准: 143016 ms
已过时间加入准则: 143256ms

测试

SET NOCOUNT ON;


DECLARE @num    INT,
@iter   INT


SELECT  @num    = 1000, -- Number of records in TableA and TableB, the cross table is populated with a CROSS JOIN from A to B
@iter   = 1000  -- Number of select iterations to perform


DECLARE @a TABLE (
id INT
)


DECLARE @b TABLE (
id INT
)


DECLARE @x TABLE (
aid INT,
bid INT
)


DECLARE @num_curr INT
SELECT  @num_curr = 1
        

WHILE (@num_curr <= @num)
BEGIN
INSERT @a (id) SELECT @num_curr
INSERT @b (id) SELECT @num_curr
    

SELECT @num_curr = @num_curr + 1
END


INSERT      @x (aid, bid)
SELECT      a.id,
b.id
FROM        @a a
CROSS JOIN  @b b


/*
TEST
*/
DECLARE @begin_where    DATETIME,
@end_where      DATETIME,
@count_where    INT,
@begin_join     DATETIME,
@end_join       DATETIME,
@count_join     INT,
@curr           INT,
@aid            INT


DECLARE @temp TABLE (
curr    INT,
aid     INT,
bid     INT
)


DELETE FROM @temp


SELECT  @curr   = 0,
@aid    = 50


SELECT  @begin_where = CURRENT_TIMESTAMP
WHILE (@curr < @iter)
BEGIN
INSERT      @temp (curr, aid, bid)
SELECT      @curr,
aid,
bid
FROM        @a a
INNER JOIN  @x x
ON  a.id = x.aid
INNER JOIN  @b b
ON  x.bid = b.id
WHERE       a.id = @aid
        

SELECT @curr = @curr + 1
END
SELECT  @end_where = CURRENT_TIMESTAMP


SELECT  @count_where = COUNT(1) FROM @temp
DELETE FROM @temp


SELECT  @curr = 0
SELECT  @begin_join = CURRENT_TIMESTAMP
WHILE (@curr < @iter)
BEGIN
INSERT      @temp (curr, aid, bid)
SELECT      @curr,
aid,
bid
FROM        @a a
INNER JOIN  @x x
ON  a.id = x.aid
AND a.id = @aid
INNER JOIN  @b b
ON  x.bid = b.id
    

SELECT @curr = @curr + 1
END
SELECT  @end_join = CURRENT_TIMESTAMP


SELECT  @count_join = COUNT(1) FROM @temp
DELETE FROM @temp


SELECT  @count_where AS count_where,
@count_join AS count_join,
DATEDIFF(millisecond, @begin_where, @end_where) AS elapsed_where,
DATEDIFF(millisecond, @begin_join, @end_join) AS elapsed_join
107239 次浏览

我猜是第一个,因为它对数据进行了更具体的过滤。但是 应该看看执行计划和任何优化一样,因为它可以根据数据的大小、服务器硬件等而有很大的不同。

Is it faster? Try it and see.

哪个更容易读?第一个对我来说看起来更“正确”,因为移动的条件实际上与 join 无关。

对于任何值一分钱的查询优化器... ... 它们都是相同的。

在性能方面,它们是相同的(并且产生相同的计划)

逻辑上,如果用 LEFT JOIN替换 INNER JOIN,应该使操作仍然有意义。

在你的情况下,这将是这样的:

SELECT  *
FROM    TableA a
LEFT JOIN
TableXRef x
ON      x.TableAID = a.ID
AND a.ID = 1
LEFT JOIN
TableB b
ON      x.TableBID = b.ID

或者这样:

SELECT  *
FROM    TableA a
LEFT JOIN
TableXRef x
ON      x.TableAID = a.ID
LEFT JOIN
TableB b
ON      b.id = x.TableBID
WHERE   a.id = 1

前一个查询不会返回除 1以外的 a.id的任何实际匹配,因此后一种语法(使用 WHERE)在逻辑上更加一致。

As far as the two methods go.

  • JOIN/ON 用于连接表
  • WHERE 用于筛选结果

虽然你可以用不同的方式使用它们,但对我来说,它总是像一种气味。

当性能有问题的时候就处理它,然后你就可以研究这样的“优化”。

这种联接的位置不太可能成为性能的决定性因素。我并不十分熟悉 tsql 的执行计划,但是它们很可能会针对类似的计划自动进行优化。

规则 # 0: 运行一些基准测试,然后看看!要想知道哪个更快,唯一的办法就是试一试。使用 SQL 分析器很容易执行这些类型的基准测试。

Also, examine the execution plan for the query written with a JOIN and with a WHERE clause to see what differences stand out.

最后,正如其他人所说的那样,任何像样的优化器都应该对这两个优化器进行相同的处理,包括内置在 SQLServer 中的优化器。

For inner joins it doesn't matter where you put your criteria. The SQL compiler will transform both into an execution plan in which the filtering occurs below the join (ie. as if the filter expressions appears is in the join condition).

外部联接是另一回事,因为过滤器的位置改变了查询的语义。

在后面他们是一样的。我们知道这一点,因为如果对每个查询执行 explain analyze,计划结果是相同的。举个例子:

# explain analyze select e.* from event e join result r on e.id = r.event_id and r.team_2_score=24;


QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Hash Join  (cost=27.09..38.22 rows=7 width=899) (actual time=0.045..0.047 rows=1 loops=1)
Hash Cond: (e.id = r.event_id)
->  Seq Scan on event e  (cost=0.00..10.80 rows=80 width=899) (actual time=0.009..0.010 rows=2 loops=1)
->  Hash  (cost=27.00..27.00 rows=7 width=8) (actual time=0.017..0.017 rows=1 loops=1)
Buckets: 1024  Batches: 1  Memory Usage: 9kB
->  Seq Scan on result r  (cost=0.00..27.00 rows=7 width=8) (actual time=0.006..0.008 rows=1 loops=1)
Filter: (team_2_score = 24)
Rows Removed by Filter: 1
Planning time: 0.182 ms
Execution time: 0.101 ms
(10 rows)


# explain analyze select e.* from event e join result r on e.id = r.event_id where r.team_2_score=24;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Hash Join  (cost=27.09..38.22 rows=7 width=899) (actual time=0.027..0.029 rows=1 loops=1)
Hash Cond: (e.id = r.event_id)
->  Seq Scan on event e  (cost=0.00..10.80 rows=80 width=899) (actual time=0.010..0.011 rows=2 loops=1)
->  Hash  (cost=27.00..27.00 rows=7 width=8) (actual time=0.010..0.010 rows=1 loops=1)
Buckets: 1024  Batches: 1  Memory Usage: 9kB
->  Seq Scan on result r  (cost=0.00..27.00 rows=7 width=8) (actual time=0.006..0.007 rows=1 loops=1)
Filter: (team_2_score = 24)
Rows Removed by Filter: 1
Planning time: 0.140 ms
Execution time: 0.058 ms
(10 rows)

它们具有相同的最小和最大成本以及相同的查询计划。另外,请注意,即使在最上面的查询中,team _ score _ 2也被应用为“ Filter”。