MySQL“ IN”操作符在(大?)数值上的性能

我最近一直在尝试使用 Redis 和 MongoDB,似乎经常会出现这样的情况: 在 MongoDB 或 Redis 中存储一个 身份证数组。我将坚持 Redis 的这个问题,因为我问关于 MySQL进去操作符。

我想知道在 IN 操作符中列出大量(300-3000) 身份证的性能如何,它看起来像这样:

SELECT id, name, price
FROM products
WHERE id IN (1, 2, 3, 4, ...... 3000)

设想一个简单的 产品类别表,通常您可以将它们连接在一起,从某个 类别获得 产品。在上面的示例中,您可以看到,在 Redis (category:4:product_ids)中的给定类别下,我返回 id 为4的类别中的所有产品 ID,并将它们放在上面的 SELECT查询中的 IN操作符中。

表现如何?

这是“看情况”的情况吗?或者有一个具体的“这是(不)可以接受的”或“快”或“慢”,或者我应该添加一个 LIMIT 25,或不帮助?

SELECT id, name, price
FROM products
WHERE id IN (1, 2, 3, 4, ...... 3000)
LIMIT 25

或者我应该修剪 Redis 返回的产品 id 的数组,将其限制为25,并且只向查询添加25个 id,而不是从查询内部将其从3000和 LIMIT增加到25?

SELECT id, name, price
FROM products
WHERE id IN (1, 2, 3, 4, ...... 25)

任何建议/反馈都非常感谢!

73676 次浏览

IN is fine, and well optimized. Make sure you use it on an indexed field and you're fine.

It's functionally equivalent to:

(x = 1 OR x = 2 OR x = 3 ... OR x = 99)

As far as the DB engine is concerned.

EDIT: Please notice this answer was written in 2011, and see the comments of this answer discussing the latest MySQL features.

Generally speaking, if the IN list gets too large (for some ill-defined value of 'too large' that is usually in the region of 100 or smaller), it becomes more efficient to use a join, creating a temporary table if need so be to hold the numbers.

If the numbers are a dense set (no gaps - which the sample data suggests), then you can do even better with WHERE id BETWEEN 300 AND 3000.

However, presumably there are gaps in the set, at which point it may be better to go with the list of valid values after all (unless the gaps are relatively few in number, in which case you could use:

WHERE id BETWEEN 300 AND 3000 AND id NOT BETWEEN 742 AND 836

Or whatever the gaps are.

When you provide many values for the IN operator it first must sort it to remove duplicates. At least I suspect that. So it would be not good to provide too many values, as sorting takes N log N time.

My experience proved that slicing the set of values into smaller subsets and combining the results of all the queries in the application gives best performance. I admit that I gathered experience on a different database (Pervasive), but the same may apply to all the engines. My count of values per set was 500-1000. More or less was significantly slower.

I have been doing some tests, and as David Fells says in his answer, it is quite well optimized. As a reference, I have created an InnoDB table with 1,000,000 registers and doing a select with the "IN" operator with 500,000 random numbers, it takes only 2.5 seconds on my MAC; selecting only the even registers takes 0.5 seconds.

The only problem that I had is that I had to increase the max_allowed_packet parameter from the my.cnf file. If not, a mysterious “MYSQL has gone away” error is generated.

Here is the PHP code that I use to make the test:

$NROWS =1000000;
$SELECTED = 50;
$NROWSINSERT =15000;


$dsn="mysql:host=localhost;port=8889;dbname=testschema";
$pdo = new PDO($dsn, "root", "root");
$pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);


$pdo->exec("drop table if exists `uniclau`.`testtable`");
$pdo->exec("CREATE  TABLE `testtable` (
`id` INT NOT NULL ,
`text` VARCHAR(45) NULL ,
PRIMARY KEY (`id`) )");


$before = microtime(true);


$Values='';
$SelValues='(';
$c=0;
for ($i=0; $i<$NROWS; $i++) {
$r = rand(0,99);
if ($c>0) $Values .= ",";
$Values .= "( $i , 'This is value $i and r= $r')";
if ($r<$SELECTED) {
if ($SelValues!="(") $SelValues .= ",";
$SelValues .= $i;
}
$c++;


if (($c==100)||(($i==$NROWS-1)&&($c>0))) {
$pdo->exec("INSERT INTO `testtable` VALUES $Values");
$Values = "";
$c=0;
}
}
$SelValues .=')';
echo "<br>";




$after = microtime(true);
echo "Insert execution time =" . ($after-$before) . "s<br>";


$before = microtime(true);
$sql = "SELECT count(*) FROM `testtable` WHERE id IN $SelValues";
$result = $pdo->prepare($sql);
$after = microtime(true);
echo "Prepare execution time =" . ($after-$before) . "s<br>";


$before = microtime(true);


$result->execute();
$c = $result->fetchColumn();


$after = microtime(true);
echo "Random selection = $c Time execution time =" . ($after-$before) . "s<br>";






$before = microtime(true);


$sql = "SELECT count(*) FROM `testtable` WHERE id %2 = 1";
$result = $pdo->prepare($sql);
$result->execute();
$c = $result->fetchColumn();


$after = microtime(true);
echo "Pairs = $c Exdcution time=" . ($after-$before) . "s<br>";

And the results:

Insert execution time =35.2927210331s
Prepare execution time =0.0161771774292s
Random selection = 499102 Time execution time =2.40285992622s
Pairs = 500000 Exdcution time=0.465420007706s

You can create a temporary table where you can put any number of IDs and run nested query Example:

CREATE [TEMPORARY] TABLE tmp_IDs (`ID` INT NOT NULL,PRIMARY KEY (`ID`));

and select:

SELECT id, name, price
FROM products
WHERE id IN (SELECT ID FROM tmp_IDs);

Using IN with a large parameter set on a large list of records will in fact be slow.

In the case that I solved recently I had two where clauses, one with 2,50 parameters and the other with 3,500 parameters, querying a table of 40 Million records.

My query took 5 minutes using the standard WHERE IN. By instead using a subquery for the IN statement (putting the parameters in their own indexed table), I got the query down to TWO seconds.

Worked for both MySQL and Oracle in my experience.