使用命令行工具按排序序列计算重复项

我有一个命令(cmd1) ,它通过一个日志文件来过滤掉一组数字。数字是 所以我使用 sort-gr 来获得一个反向排序的数字列表。里面可能有副本 我需要找到该列表中每个唯一数字的计数。

例如,如果 cmd1的输出是:

100
100
100
99
99
26
25
24
24

我需要另一个命令,我可以通过管道将上面的输出传递给它,这样,我就可以得到:

100     3
99      2
26      1
25      1
24      2
79050 次浏览

if order is not important

# echo "100 100 100 99 99 26 25 24 24" | awk '{for(i=1;i<=NF;i++)a[$i]++}END{for(o in a) printf "%s %s ",o,a[o]}'
26 1 100 3 99 2 24 2 25 1

how about;

$ echo "100 100 100 99 99 26 25 24 24" \
| tr " " "\n" \
| sort \
| uniq -c \
| sort -k2nr \
| awk '{printf("%s\t%s\n",$2,$1)}END{print}'

The result is :

100 3
99  2
26  1
25  1
24  2

uniq -c works for GNU uniq 8.23 at least, and does exactly what you want (assuming sorted input).

Numerically sort the numbers in reverse, then count the duplicates, then swap the left and the right words. Align into columns.

printf '%d\n' 100 99 26 25 100 24 100 24 99 \
| sort -nr | uniq -c | awk '{printf "%-8s%s\n", $2, $1}'
100     3
99      2
26      1
25      1
24      2

In Bash, we can use an associative array to count instances of each input value. Assuming we have the command $cmd1, e.g.

#!/bin/bash


cmd1='printf %d\n 100 99 26 25 100 24 100 24 99'

Then we can count values in the array variable a using the ++ mathematical operator on the relevant array entries:

while read i
do
((++a["$i"]))
done < <($cmd1)

We can print the resulting values:

for i in "${!a[@]}"
do
echo "$i ${a[$i]}"
done

If the order of output is important, we might need an external sort of the keys:

for i in $(printf '%s\n' "${!a[@]}" | sort -nr)
do
echo "$i ${a[$i]}"
done

In case you have input stored in my_file you can do:

sort -nr my_file | uniq -c | awk ' { t = $1; $1 = $2; $2 = t; print; } '

Otherwise just pipe the input to be processed to the same cmd.

Explanation:

  • sort -nr sorts the input numerically (-n) in reverse order (-r)
  • uniq -c count duplicates and shows the count side-by-side
  • awk '{ t = $1; $1 = $2; $2 = t; print; }' swaps the two columns