The GNU 网站 suggests this nice awk script, which prints both the words and their frequency.
可能的变化:
您可以通过管道 sort -nr(并反向 word和 freq[word])查看降序结果。
If you want a specific column, you can omit the for loop and simply write freq[3]++ - replace 3 with the column number.
开始了:
# wordfreq.awk --- print list of word frequencies
{
$0 = tolower($0) # remove case distinctions
# remove punctuation
gsub(/[^[:alnum:]_[:blank:]]/, "", $0)
for (i = 1; i <= NF; i++)
freq[$i]++
}
END {
for (word in freq)
printf "%s\t%d\n", word, freq[word]
}
#!/usr/bin/env ruby
Dir["*"].each do |file|
h=Hash.new(0)
open(file).each do |row|
row.chomp.split("\t").each do |w|
h[ w ] += 1
end
end
h.sort{|a,b| b[1]<=>a[1] }.each{|x,y| print "#{x}:#{y}\n" }
end
In the top-level while loop:
* Loop over each line of the combined input files
* 将这行拆分为@Fields 数组
* 对于每一列,递增结果哈希数组数据结构
在顶级 for 循环中:
* 循环遍历结果数组
* 列号
* 获取该列中使用的值
* 按出现次数对值进行排序
* 基于值的二级排序(例如 b vs g vs m vs z)
* Iterate through the result hash, using the sorted list
* 列印每次出现的数值和次数
基于@Dennis 提供的示例输入文件的结果
column 0:
a 3
z 3
t 1
v 1
w 1
column 1:
d 3
r 2
b 1
g 1
m 1
z 1
column 2:
c 4
a 3
e 2
. csv 输入
如果输入文件是. csv,则将 /\s+/更改为 /,/
混淆视听
在一场丑陋的竞赛中,Perl 的装备尤其精良。
This one-liner does the same:
$ FIELD=2
$ values="$(cut -f $FIELD *)"
$ mkdir /tmp/counts
$ cd /tmp/counts
$ echo | tee -a $values
$ wc -l * | sort -nr
9 total
3 d
2 r
1 z
1 m
1 g
1 b
$