我怎么能显示线在公共(反向差异)?

小开

最佳答案

在*nix上，可以使用通讯。问题的答案是:

comm -1 -2 file1.sorted file2.sorted
# where file1 and file2 are sorted and piped into *.sorted

下面是comm的完整用法:

comm [-1] [-2] [-3 ] file1 file2
-1 Suppress the output column of lines unique to file1.
-2 Suppress the output column of lines unique to file2.
-3 Suppress the output column of lines duplicated in file1 and file2.

还要注意，在使用comm之前对文件进行排序是很重要的，正如手册页中提到的那样。

小开

之前在这里问过:Unix命令查找两个文件中共有的行

你也可以尝试使用Perl (credit 就在这里):

perl -ne 'print if ($seen{$_} .= @ARGV) =~ /10$/'  file1 file2

小开

我刚刚从答案中学习了comm命令，但我想添加一些额外的东西:如果文件没有排序，并且你不想碰原始文件，你可以管道sort命令的输出。这将使原始文件保持完整。它可以在Bash中工作，但我不能说其他shell。

comm -1 -2 <(sort file1) <(sort file2)

这可以扩展为比较命令输出，而不是文件:

comm -1 -2 <(ls /dir1 | sort) <(ls /dir2 | sort)

小开

我在被列为重复题的问题上找到了这个答案。我发现grep比comm对管理员更友好，所以如果你只是想要一组匹配的行(例如，对于比较CSV文件很有用)，只需使用

grep -F -x -f file1 file2

或者简化的fgrep版本:

fgrep -xf file1 file2

另外，你可以使用file2*来glob和查找多个文件的公共行，而不仅仅是两个。

其他一些方便的变化包括

-n标志显示每个匹配行的行号
-c只计算匹配的行数
-v只显示不同的行在file2(或使用diff)。

使用comm更快，但这种速度是以必须首先对文件进行排序为代价的。作为“反向差异”，它不是很有用。

小开

只是为了提供信息，我为Windows做了一个小工具，做的事情与“grep -F -x -F file1 file2”相同;(因为我在Windows上没有找到任何与此命令等效的命令)

这是: http://www.nerdzcore.com/?page=commonlines < / p >

CommonLines inputFile1 inputFile2 outputFile"

源代码也是可用的(GPL)。

小开

最简单的方法是:

awk 'NR==FNR{a[$1]++;next} a[$1] ' file1 file2

文件不需要排序。

小开

在< >强Windows < / >强中，你可以使用带有< >强CompareObject < / >强的PowerShell脚本:

compare-object -IncludeEqual -ExcludeDifferent -PassThru (get-content A.txt) (get-content B.txt)> MATCHING.txt | Out-Null #Find Matching Lines

CompareObject:

IncludeEqual不带-ExcludeDifferent:所有
没有-IncludeEqual:什么都没有

小开

我认为diff实用程序本身，使用它的统一(-U)选项，可以用来实现效果。因为diff输出的第一列标记了该行是添加还是删除，所以我们可以查找未更改的行。

diff -U1000 file_1 file_2 | grep '^ '

数字1000是任意选择的，大到比diff输出的任何单个块都大。

下面是完整的、简单的命令集:

f1="file_1"
f2="file_2"


lc1=$(wc -l "$f1" | cut -f1 -d' ')
lc2=$(wc -l "$f2" | cut -f1 -d' ')
lcmax=$(( lc1 > lc2 ? lc1 : lc2 ))


diff -U$lcmax "$f1" "$f2" | grep '^ ' | less


# Alternatively, use this grep to ignore the lines starting
# with +, -, and @ signs.
#   grep -vE '^[+@-]'

如果你想包含刚刚移动过的行，你可以在差分之前对输入进行排序，如下所示:

f1="file_1"
f2="file_2"


lc1=$(wc -l "$f1" | cut -f1 -d' ')
lc2=$(wc -l "$f2" | cut -f1 -d' ')
lcmax=$(( lc1 > lc2 ? lc1 : lc2 ))


diff -U$lcmax <(sort "$f1") <(sort "$f2") | grep '^ ' | less