获取数组元素索引的速度快于 O (n)

假设我有一个巨大的数组,以及它的一个值。我想得到数组中值的索引。有没有别的办法,而不是打电话 Array#index得到它?这个问题来自于需要保持非常巨大的数组和调用 Array#index的大量次数。

经过几次尝试,我发现 缓存通过使用 (value, index)字段而不是值本身存储结构来在元素内部建立索引,这在性能方面提供了巨大的进步(20倍的优势)。

不过,我还是想知道是否有一种更方便的方法可以在不缓存的情况下查找 en 元素的索引(或者有一种很好的缓存技术可以提高性能)。

110548 次浏览

Is there a good reason not to use a hash? Lookups are O(1) vs. O(n) for the array.

Convert the array into a hash. Then look for the key.

array = ['a', 'b', 'c']
hash = Hash[array.map.with_index.to_a]    # => {"a"=>0, "b"=>1, "c"=>2}
hash['b'] # => 1

If it's a sorted array you could use a Binary search algorithm (O(log n)). For example, extending the Array-class with this functionality:

class Array
def b_search(e, l = 0, u = length - 1)
return if lower_index > upper_index


midpoint_index = (lower_index + upper_index) / 2
return midpoint_index if self[midpoint_index] == value


if value < self[midpoint_index]
b_search(value, lower_index, upper_index - 1)
else
b_search(value, lower_index + 1, upper_index)
end
end
end

Why not use index or rindex?

array = %w( a b c d e)
# get FIRST index of element searched
puts array.index('a')
# get LAST index of element searched
puts array.rindex('a')

index: http://www.ruby-doc.org/core-1.9.3/Array.html#method-i-index

rindex: http://www.ruby-doc.org/core-1.9.3/Array.html#method-i-rindex

Taking a combination of @sawa's answer and the comment listed there you could implement a "quick" index and rindex on the array class.

class Array
def quick_index el
hash = Hash[self.map.with_index.to_a]
hash[el]
end


def quick_rindex el
hash = Hash[self.reverse.map.with_index.to_a]
array.length - 1 - hash[el]
end
end

Other answers don't take into account the possibility of an entry listed multiple times in an array. This will return a hash where each key is a unique object in the array and each value is an array of indices that corresponds to where the object lives:

a = [1, 2, 3, 1, 2, 3, 4]
=> [1, 2, 3, 1, 2, 3, 4]


indices = a.each_with_index.inject(Hash.new { Array.new }) do |hash, (obj, i)|
hash[obj] += [i]
hash
end
=> { 1 => [0, 3], 2 => [1, 4], 3 => [2, 5], 4 => [6] }

This allows for a quick search for duplicate entries:

indices.select { |k, v| v.size > 1 }
=> { 1 => [0, 3], 2 => [1, 4], 3 => [2, 5] }

If your array has a natural order use binary search.

Use binary search.

Binary search has O(log n) access time.

Here are the steps on how to use binary search,

  • What is the ordering of you array? For example, is it sorted by name?
  • Use bsearch to find elements or indices

Code example

# assume array is sorted by name!


array.bsearch { |each| "Jamie" <=> each.name } # returns element
(0..array.size).bsearch { |n| "Jamie" <=> array[n].name } # returns index

Still I wonder if there's a more convenient way of finding index of en element without caching (or there's a good caching technique that will boost up the performance).

You can use binary search (if your array is ordered and the values you store in the array are comparable in some way). For that to work you need to be able to tell the binary search whether it should be looking "to the left" or "to the right" of the current element. But I believe there is nothing wrong with storing the index at insertion time and then using it if you are getting the element from the same array.