等于(=)和具有一个文本值的 IN 之间的性能差异

当我们使用等号和 IN 运算符有相同的值时,SQL 引擎有什么不同? 执行时间有变化吗?

第一个使用相等检查运算符

WHERE column_value = 'All'

第二个使用 IN运算符和单值

WHERE column_value IN ('All')

如果只有一个值,SQL 引擎是否将 IN更改为 =

MySQL 和 PostgreSQL 有什么不同吗?

29848 次浏览

There is no difference between those two statements, and the optimiser will transform the IN to the = when IN has just one element in it.

Though when you have a question like this, just run both statements, run their execution plan and see the differences. Here - you won't find any.

After a big search online, I found a document on SQL to support this (I assume it applies to all DBMS):

If there is only one value inside the parenthesis, this commend [sic] is equivalent to,

WHERE "column_name" = 'value1

Here is the execution plan of both queries in Oracle (most DBMS will process this the same):

EXPLAIN PLAN FOR
select * from dim_employees t
where t.identity_number = '123456789'


Plan hash value: 2312174735
-----------------------------------------------------
| Id  | Operation                   | Name          |
-----------------------------------------------------
|   0 | SELECT STATEMENT            |               |
|   1 |  TABLE ACCESS BY INDEX ROWID| DIM_EMPLOYEES |
|   2 |   INDEX UNIQUE SCAN         | SYS_C0029838  |
-----------------------------------------------------

And for IN() :

EXPLAIN PLAN FOR
select * from dim_employees t
where t.identity_number in('123456789');


Plan hash value: 2312174735
-----------------------------------------------------
| Id  | Operation                   | Name          |
-----------------------------------------------------
|   0 | SELECT STATEMENT            |               |
|   1 |  TABLE ACCESS BY INDEX ROWID| DIM_EMPLOYEES |
|   2 |   INDEX UNIQUE SCAN         | SYS_C0029838  |
-----------------------------------------------------

As you can see, both are identical. This is on an indexed column. Same goes for an unindexed column (just full table scan).

There is no difference when you are using it with a single value. If you will check the table scan, index scan, or index seek for the above two queries you will find that there is no difference between the two queries.

Is there any difference for same in Mysql and PostgresSQL?

No it would not have any difference on the two engines(Infact it would be same for most of the databases including SQL Server, Oracle etc). Both engines will convert IN to =

For single IN Clause,there is no difference..below is demo using an EMPS table i have..

select * from emps where empid in (1)
select * from emps where empid=1

Predicate for First Query in execution plan:

[PerformanceV3].[dbo].[Emps].[empID]=CONVERT_IMPLICIT(int,[@1],0)

Predicate for second query in execution plan:

[PerformanceV3].[dbo].[Emps].[empID]=CONVERT_IMPLICIT(int,[@1],0)

If you have multiple values in IN Clause,its better to convert them to joins

There are no big differences really, but if your column_value is indexed, IN operator may not read it as an index.

Encountered this problem once, so be careful.

Teach a man to fish, etc. Here's how to see for yourself what variations on your queries will do:

mysql> EXPLAIN SELECT * FROM sentence WHERE sentence_lang_id = "AMH"\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sentence
type: ref
possible_keys: sentence_lang_id
key: sentence_lang_id
key_len: 153
ref: const
rows: 442
Extra: Using where

And let's try it the other way:

mysql> EXPLAIN SELECT * FROM sentence WHERE sentence_lang_id in ("AMH")\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sentence
type: ref
possible_keys: sentence_lang_id
key: sentence_lang_id
key_len: 153
ref: const
rows: 442
Extra: Using where

You can read here about how to interpret the results of a mysql EXPLAIN request. For now, note that we got identical output for both queries: exactly the same "execution plan" is generated. The type row tells us that the query uses a non-unique index (a foreign key, in this case), and the ref row tells us that the query is executed by comparing a constant value against this index.

Just to add a different perspective, one of the main points of rdbms systems is that they will rewrite your query for you, and pick the best execution plan for that query and all equivalent ones. This means that as long as two queries are logically identical, the should always generate the same execution plan on a given rdbms.

That being said, many queries are equivalent (same result set) but only because of constraints the database itself is unaware of, so be careful about those cases (E.g for a flag field with numbers 1-6, the db doesn't know <3 is the same as in (1,2)). But at the end of the day, if you're just thinking about legibility of and and or statements it won't make a difference for performance which way you write them.