PHP: 处理未定义数组键的最快方法

在一个非常紧凑的循环中,我需要访问包含数百万个元素的数组中的数万个值。键可以是未定义的: 在这种情况下,返回没有任何错误消息的 NULL 是合法的:

数组键存在: 元素的返回值。 数组键不存在: 返回 null。

我知道多种解决方案:

    if (isset($lookup_table[$key])) {
return $lookup_table[$key];
} else {
return;
}

或者

@return $lookup_table[$key];

或者

error_reporting(0);
$return = $lookup_table[$key];
error_reporting(E_ALL);
return $return;

所有的解决方案都远非最佳:

  • 第一个需要在 B-TREE 中进行2次查找: 一次检查是否存在,另一次检索值。这实际上使运行时增加了一倍。
  • 第二种方法使用错误抑制操作符,因此在该行上产生大量开销。
  • 第三个函数调用错误处理程序(它将检查 error _ report 设置,然后不显示任何内容) ,从而创建一个开销。

我的问题 是,如果我错过了一种避免错误处理的方法,但是却使用了一个 Btree 查找?

回答一些问题:

该数组缓存复杂计算的结果-复杂到要实时完成。 在数十亿个可能的价值中,只有数百万个产生有效的结果。数组类似于1234567 = > 23457,1234999 = > 74361,... 。它被保存到一个几兆字节的 PHP 文件中,并在执行开始时包含 _ once-d。初始加载时间并不重要。 如果没有找到该键,那只是意味着此特定值将不会返回有效的结果。麻烦的是要做到这一点5万以上每秒。

结论

由于没有办法通过单一查找和不进行错误处理来获得值,所以我很难接受单一答案。相反,我支持所有伟大的贡献。

最有价值的投入包括:

  • 使用 array _ key _ vis,因为它比其他方法更快
  • 查看 PHP 的 QuickHash

关于 PHP 如何处理数组有很多困惑。如果检查源代码,您将看到所有数组都是平衡树。构建自己的查找方法在 C 和 C + + 中很常见,但是在 PHP 这样的高级脚本语言中不能执行。

306522 次浏览

There are two typical approaches to this.

  1. Define defaults for an undefined key.
  2. Check for undefined key.

Here is how to perform the first and as little code as possible.

$data = array_merge(array($key=>false),$data);
return $data[$key];

Here is how to perform the second.

return isset($data[$key]) ? $data[$key] : false;

Update

Since PHP 7 you can accomplish this with the null coalesce operator:

return $table[$key] ?? null;

Old answer

First of all, arrays are not implemented as a B-tree, it's a hash table; an array of buckets (indexed via a hash function), each with a linked list of actual values (in case of hash collisions). This means that lookup times depend on how well the hash function has "spread" the values across the buckets, i.e. the number of hash collisions is an important factor.

Technically, this statement is the most correct:

return array_key_exists($key, $table) ? $table[$key] : null;

This introduces a function call and is therefore much slower than the optimized isset(). How much? ~2e3 times slower.

Next up is using a reference to avoid the second lookup:

$tmp = &$lookup_table[$key];


return isset($tmp) ? $tmp : null;

Unfortunately, this modifies the original $lookup_table array if the item does not exist, because references are always made valid by PHP.

That leaves the following method, which is much like your own:

return isset($lookup_table[$key]) ? $lookup_table[$key] : null;

Besides not having the side effect of references, it's also faster in runtime, even when performing the lookup twice.

You could look into dividing your arrays into smaller pieces as one way to mitigate long lookup times.

Just a sudden idea that would have to be tested, but did you try using array_intersect_key() to get the existing values and a array_merge to fill() the rest ? It would remove the need of a loop to access the data. Something like that :

$searched_keys = array ('key1' => null, 'key2' => null); // the list of the keys to find


$exiting_values = array_intersect_key($lookup_table, $searched_keys);
$all_values = array_merge($searched_keys, $exiting_keys);

Please note that I did not tried it performance-wise.

I did some bench marking with the following code:

set_time_limit(100);


$count = 2500000;
$search_index_end = $count * 1.5;
$search_index_start = $count * .5;


$array = array();
for ($i = 0; $i < $count; $i++)
$array[md5($i)] = $i;


$start = microtime(true);
for ($i = $search_index_start; $i < $search_index_end; $i++) {
$key = md5($i);
$test = isset($array[$key]) ? $array[$key] : null;
}
$end = microtime(true);
echo ($end - $start) . " seconds<br/>";


$start = microtime(true);
for ($i = $search_index_start; $i < $search_index_end; $i++) {
$key = md5($i);
$test = array_key_exists($key, $array) ? $array[$key] : null;
}
$end = microtime(true);
echo ($end - $start) . " seconds<br/>";




$start = microtime(true);
for ($i = $search_index_start; $i < $search_index_end; $i++) {
$key = md5($i);
$test = @$array[$key];
}
$end = microtime(true);
echo ($end - $start) . " seconds<br/>";


$error_reporting = error_reporting();
error_reporting(0);
$start = microtime(true);
for ($i = $search_index_start; $i < $search_index_end; $i++) {
$key = md5($i);
$test = $array[$key];
}
$end = microtime(true);
echo ($end - $start) . " seconds<br/>";
error_reporting($error_reporting);


$start = microtime(true);
for ($i = $search_index_start; $i < $search_index_end; $i++) {
$key = md5($i);
$tmp = &$array[$key];
$test = isset($tmp) ? $tmp : null;
}
$end = microtime(true);
echo ($end - $start) . " seconds<br/>";

and I found that the fastest running test was the one that uses isset($array[$key]) ? $array[$key] : null followed closely by the solution that just disables error reporting.

First, re-organize the data for performance by saving a new array where the data is sorted by the keys, but the new array contains a regular numeric index.

This part will be time consuming, but only done once.

 // first sort the array by it's keys
ksort($data);


// second create a new array with numeric index
$tmp = new array();
foreach($data as $key=>$value)
{
$tmp[] = array('key'=>$key,'value'=>$value);
}
// now save and use this data instead
save_to_file($tmp);

Once that is done it should be quick to find the key using a Binary Search. Later you can use a function like this.

  function findKey($key, $data, $start, $end)
{
if($end < $start)
{
return null;
}


$mid = (int)(($end - $start) / 2) + $start;


if($data[$mid]['key'] > $key)
{
return findKey($key, $data, $start, $mid - 1);
}
else if($data[$mid]['key'] < $key)
{
return findKey($key, $data, $mid + 1, $end);
}


return $data[$mid]['value'];
}

To perform a search for a key you would do this.

 $result = findKey($key, $data, 0, count($data));
if($result === null)
{
// key not found.
}

If the count($data) is done all the time, then you could cache that in the file that you stored the array data.

I suspect this method will be a lot faster in performance then a regular linear search that is repeated against the $data. I can't promise it's faster. Only an octree would be quicker, but the time to build the octree might cancel out the search performance (I've experienced that before). It depends on how much searching in the data you have to do.

This Work for me

\{\{ isset($array['key']) ? $array['key']: 'Default' }}

but this is fast

\{\{ $array['key'] or 'Default' }}

The @ operator and the error_reporting methods will both be slower than using isset. With both of these methods, it modifies the error reporting setting for PHP, but PHP's error handler will still be called. The error handler will check against the error_reporting setting and exit without reporting anything, however this has still taken time.

I prefer using the isset function instead of escaping the error. I made a function to check the key exists and if not returns a default value, in the case of nested arrays you just need to add the other keys in order:

Nested array lookup:

/**
* Lookup array value.
*
* @param array $array
* @param array $keys
* @param $defaultValue
*/
public static function array_key_lookup($array, $keys, $defaultValue)
{
$value = $array;
foreach ($keys as $key) {
if (isset($value[$key])) {
$value = $value[$key];
} else {
$value = $defaultValue;
break;
}
}


return $value;
}

Usage example:

$array = [
'key1' => 'value1',
'key2' => 'value2',
'key3' => [
'key3a' => 'value3a',
'key3b' => 'value3b'
]
];


array_key_lookup($array, ['key3', 'key3a'], 'default')
'value3a'


array_key_lookup($array, ['key2', 'key2a'], 'default')
'default'


array_key_lookup($array, ['key2'], 'default')
'value2'


array_key_lookup($array, ['key5'], 'default')
'default'


Escaping the error:

$value = @$array[$key1][$key2] ?: $defaultValue;