如何计算 Ruby 数组中相同的字符串元素

我有以下 Array = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]

How do I produce a count for each 相同的元素?

Where:
"Jason" = 2, "Judah" = 3, "Allison" = 1, "Teresa" = 1, "Michelle" = 1?

做出一个大杂烩在哪里:

地点: 哈希 = {“ Jason”= > 2,“ Judah”= > 3,“ Allison”= > 1,“ Teresa”= > 1,“ Michelle”= > 1}

85913 次浏览
names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
counts = Hash.new(0)
names.each { |name| counts[name] += 1 }
# => {"Jason" => 2, "Teresa" => 1, ....
names.inject(Hash.new(0)) { |total, e| total[e] += 1 ;total}

给你

{"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

这个管用。

arr = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
result = {}
arr.uniq.each{|element| result[element] = arr.count(element)}

实际上有一个数据结构可以做到这一点: MultiSet

不幸的是,在 Ruby 核心库或标准库中没有 MultiSet实现,但是在 web 中有一些实现。

这是选择数据结构可以简化算法的一个很好的例子。事实上,在这个特殊的例子中,算法甚至 completely也消失了。字面意思就是:

Multiset.new(*names)

就是这样。例如,使用 https://GitHub.Com/Josh/Multimap/:

require 'multiset'


names = %w[Jason Jason Teresa Judah Michelle Judah Judah Allison]


histogram = Multiset.new(*names)
# => #<Multiset: {"Jason", "Jason", "Teresa", "Judah", "Judah", "Judah", "Michelle", "Allison"}>


histogram.multiplicity('Judah')
# => 3

例如,使用 http://maraigue.hhiro.net/multiset/index-en.php:

require 'multiset'


names = %w[Jason Jason Teresa Judah Michelle Judah Judah Allison]


histogram = Multiset[*names]
# => #<Multiset:#2 'Jason', #1 'Teresa', #3 'Judah', #1 'Michelle', #1 'Allison'>

This is more a comment than an answer, but a comment wouldn't do it justice. If you do Array = foo, you crash at least one implementation of IRB:

C:\Documents and Settings\a.grimm>irb
irb(main):001:0> Array = nil
(irb):1: warning: already initialized constant Array
=> nil
C:/Ruby19/lib/ruby/site_ruby/1.9.1/rbreadline.rb:3177:in `rl_redisplay': undefined method `new' for nil:NilClass (NoMethodError)
from C:/Ruby19/lib/ruby/site_ruby/1.9.1/rbreadline.rb:3873:in `readline_internal_setup'
from C:/Ruby19/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4704:in `readline_internal'
from C:/Ruby19/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4727:in `readline'
from C:/Ruby19/lib/ruby/site_ruby/1.9.1/readline.rb:40:in `readline'
from C:/Ruby19/lib/ruby/1.9.1/irb/input-method.rb:115:in `gets'
from C:/Ruby19/lib/ruby/1.9.1/irb.rb:139:in `block (2 levels) in eval_input'
from C:/Ruby19/lib/ruby/1.9.1/irb.rb:271:in `signal_status'
from C:/Ruby19/lib/ruby/1.9.1/irb.rb:138:in `block in eval_input'
from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:189:in `call'
from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:189:in `buf_input'
from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:103:in `getc'
from C:/Ruby19/lib/ruby/1.9.1/irb/slex.rb:205:in `match_io'
from C:/Ruby19/lib/ruby/1.9.1/irb/slex.rb:75:in `match'
from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:287:in `token'
from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:263:in `lex'
from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:234:in `block (2 levels) in each_top_level_statement'
from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:230:in `loop'
from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:230:in `block in each_top_level_statement'
from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in `catch'
from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in `each_top_level_statement'
from C:/Ruby19/lib/ruby/1.9.1/irb.rb:153:in `eval_input'
from C:/Ruby19/lib/ruby/1.9.1/irb.rb:70:in `block in start'
from C:/Ruby19/lib/ruby/1.9.1/irb.rb:69:in `catch'
from C:/Ruby19/lib/ruby/1.9.1/irb.rb:69:in `start'
from C:/Ruby19/bin/irb:12:in `<main>'


C:\Documents and Settings\a.grimm>

那是因为 Array是一个类。

下面是一种稍微更加函数化的编程风格:

array_with_lower_case_a = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
hash_grouped_by_name = array_with_lower_case_a.group_by {|name| name}
hash_grouped_by_name.map{|name, names| [name, names.length]}
=> [["Jason", 2], ["Teresa", 1], ["Judah", 3], ["Michelle", 1], ["Allison", 1]]

group_by的一个优点是,您可以使用它来对等价但不完全相同的项目进行分组:

another_array_with_lower_case_a = ["Jason", "jason", "Teresa", "Judah", "Michelle", "Judah Ben-Hur", "JUDAH", "Allison"]
hash_grouped_by_first_name = another_array_with_lower_case_a.group_by {|name| name.split(" ").first.capitalize}
hash_grouped_by_first_name.map{|first_name, names| [first_name, names.length]}
=> [["Jason", 2], ["Teresa", 1], ["Judah", 3], ["Michelle", 1], ["Allison", 1]]
names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
Hash[names.group_by{|i| i }.map{|k,v| [k,v.size]}]
# => {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}
arr = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]


arr.uniq.inject({}) {|a, e| a.merge({e => arr.count(e)})}

Time elapsed 0.028 milliseconds

interestingly, stupidgeek's implementation benchmarked:

时间流逝0.041毫秒

获胜的答案是:

时间流逝0.011毫秒

:)

a = [1, 2, 3, 2, 5, 6, 7, 5, 5]
a.each_with_object(Hash.new(0)) { |o, h| h[o] += 1 }


# => {1=>1, 2=>2, 3=>1, 5=>3, 6=>1, 7=>1}

信贷 弗兰克 · 旺巴特

Enumberable#each_with_object 可以避免返回最终散列。

names.each_with_object(Hash.new(0)) { |name, hash| hash[name] += 1 }

返回:

=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

现在使用 Ruby2.2.0,您可以利用 itself

names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
counts = {}
names.group_by(&:itself).each { |k,v| counts[k] = v.length }
# counts > {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

这里有很多很棒的实现。

但是作为一个初学者,我认为这是最容易阅读和实现的

names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]


name_frequency_hash = {}


names.each do |name|
count = names.count(name)
name_frequency_hash[name] = count
end
#=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

The steps we took:

  • 我们创建了散列表
  • 我们循环播放 names数组
  • we counted how many times each name appeared in the names array
  • 我们使用 name创建了一个键,使用 count创建了一个值

It may be slightly more verbose (and performance wise you will be doing some unnecessary work with overriding keys), but in my opinion easier to read and understand for what you want to achieve

Ruby v2.7 + (最新版本)

从 Ruby v2.7.0(2019年12月发布)开始,核心语言现在包括 Enumerable#tally-a 新方法,这是专门为这个问题设计的:

names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]


names.tally
#=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

Ruby v2.4 + (目前支持,但更早)

当这个问题第一次被问及时(2011年2月) ,以下代码在标准 Ruby 中是不可能的,因为它使用:

这些对 Ruby 的现代添加支持以下实现:

names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]


names.group_by(&:itself).transform_values(&:count)
#=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

Ruby v2.2 + (弃用)

如果使用较旧的 Ruby 版本,而不能访问上面提到的 Hash#transform_values方法,那么可以改为使用 Array#to_h,它是在 Ruby v2.1.0(2013年12月发布)中添加的:

names.group_by(&:itself).map { |k,v| [k, v.length] }.to_h
#=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

对于更老的红宝石版本(<= 2.1) ,有几种方法可以解决这个问题,但是(在我看来)没有明确的“最佳”方法。看看这篇文章的其他答案。

Ruby 2.7 +

Ruby 2.7正是为了这个目的引入了 Enumerable#tally

在这个用例中:

array.tally
# => { "Jason" => 2, "Judah" => 3, "Allison" => 1, "Teresa" => 1, "Michelle" => 1 }

关于正在发布的特性的文档是 给你

使用 Ruby 2.6,你可以做到:

names.to_h{ |name| [name, names.count(name)] }

给你:

{"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}