什么更快: in_array 还是 isset?

这个问题仅仅是对我来说,因为我总是喜欢编写优化的代码,也可以运行在廉价的慢速服务器(或者流量大的服务器)

我环顾四周,找不到答案。我想知道在这两个例子中哪个更快,记住数组的键在我的例子中并不重要(自然是伪代码) :

<?php
$a = array();
while($new_val = 'get over 100k email addresses already lowercased'){
if(!in_array($new_val, $a){
$a[] = $new_val;
//do other stuff
}
}
?>


<?php
$a = array();
while($new_val = 'get over 100k email addresses already lowercased'){
if(!isset($a[$new_val]){
$a[$new_val] = true;
//do other stuff
}
}
?>

由于问题的关键不在于数组冲突,我想补充一点,如果您担心 $a[$new_value]的插入冲突,可以使用 $a[md5($new_value)]。它仍然可以导致冲突,但是当从用户提供的文件(http://nikic.github.com/2011/12/28/Supercolliding-a-PHP-array.html)中读取数据时,它可以避免可能的 DoS 攻击

54435 次浏览

The second would be faster, as it is looking only for that specific array key and does not need to iterate over the entire array until it is found (will look at every array element if it is not found)

Which is faster: isset() vs in_array()

isset() is faster.

While it should be obvious, isset() only tests a single value. Whereas in_array() will iterate over the entire array, testing the value of each element.

Rough benchmarking is quite easy using microtime().

Results:

Total time isset():    0.002857
Total time in_array(): 0.017103

Note: Results were similar regardless if existed or not.

Code:

<?php
$a = array();
$start = microtime( true );


for ($i = 0; $i < 10000; ++$i) {
isset($a['key']);
}


$total_time = microtime( true ) - $start;
echo "Total time: ", number_format($total_time, 6), PHP_EOL;


$start = microtime( true );


for ($i = 0; $i < 10000; ++$i) {
in_array('key', $a);
}


$total_time = microtime( true ) - $start;
echo "Total time: ", number_format($total_time, 6), PHP_EOL;


exit;

Additional Resources

I'd encourage you to also look at:

The answers so far are spot-on. Using isset in this case is faster because

  • It uses an O(1) hash search on the key whereas in_array must check every value until it finds a match.
  • Being an opcode, it has less overhead than calling the in_array built-in function.

These can be demonstrated by using an array with values (10,000 in the test below), forcing in_array to do more searching.

isset:    0.009623
in_array: 1.738441

This builds on Jason's benchmark by filling in some random values and occasionally finding a value that exists in the array. All random, so beware that times will fluctuate.

$a = array();
for ($i = 0; $i < 10000; ++$i) {
$v = rand(1, 1000000);
$a[$v] = $v;
}
echo "Size: ", count($a), PHP_EOL;


$start = microtime( true );


for ($i = 0; $i < 10000; ++$i) {
isset($a[rand(1, 1000000)]);
}


$total_time = microtime( true ) - $start;
echo "Total time: ", number_format($total_time, 6), PHP_EOL;


$start = microtime( true );


for ($i = 0; $i < 10000; ++$i) {
in_array(rand(1, 1000000), $a);
}


$total_time = microtime( true ) - $start;
echo "Total time: ", number_format($total_time, 6), PHP_EOL;

Using isset() takes advantage of speedier lookup because it uses a hash table, avoiding the need for O(n) searches.

The key is hashed first using the djb hash function to determine the bucket of similarly hashed keys in O(1). The bucket is then searched iteratively until the exact key is found in O(n).

Barring any intentional hash collisions, this approach yields much better performance than in_array().

Note that when using isset() in the way that you've shown, passing the final values to another function requires using array_keys() to create a new array. A memory compromise can be made by storing the data in both the keys and values.

Update

A good way to see how your code design decisions affect runtime performance, you can check out the compiled version of your script:

echo isset($arr[123])

compiled vars:  !0 = $arr
line     # *  op                           fetch      ext  return  operands
-----------------------------------------------------------------------------
1     0  >   ZEND_ISSET_ISEMPTY_DIM_OBJ              2000000  ~0      !0, 123
1      ECHO                                                 ~0
2    > RETURN                                               null

echo in_array(123, $arr)

compiled vars:  !0 = $arr
line     # *  op                           fetch      ext  return  operands
-----------------------------------------------------------------------------
1     0  >   SEND_VAL                                             123
1      SEND_VAR                                             !0
2      DO_FCALL                                 2  $0      'in_array'
3      ECHO                                                 $0
4    > RETURN                                               null

Not only does in_array() use a relatively inefficient O(n) search, it also needs to be called as a function (DO_FCALL) whereas isset() uses a single opcode (ZEND_ISSET_ISEMPTY_DIM_OBJ) for this.