在不同行的文件中的多个字符串(例如,整个文件,而不是基于行的搜索) ?

我希望用一个可用的返回代码(因为我真的只喜欢包含字符串的信息,所以我的一行程序比这更进一步)来对包含字符 DanskSvenskaNorsk的文件进行 grep。

我有很多这样的文件:

Disc Title: unknown
Title: 01, Length: 01:33:37.000 Chapters: 33, Cells: 31, Audio streams: 04, Subpictures: 20
Subtitle: 01, Language: ar - Arabic, Content: Undefined, Stream id: 0x20,
Subtitle: 02, Language: bg - Bulgarian, Content: Undefined, Stream id: 0x21,
Subtitle: 03, Language: cs - Czech, Content: Undefined, Stream id: 0x22,
Subtitle: 04, Language: da - Dansk, Content: Undefined, Stream id: 0x23,
Subtitle: 05, Language: de - Deutsch, Content: Undefined, Stream id: 0x24,
(...)

下面是我想要的伪代码:

for all files in directory;
if file contains "Dansk" AND "Norsk" AND "Svenska" then
then echo the filename
end

做这件事的最好方法是什么? 一行就可以做到吗?

112738 次浏览

How to grep for multiple strings in file on different lines (Use the pipe symbol):

for file in *;do
test $(grep -E 'Dansk|Norsk|Svenska' $file | wc -l) -ge 3 && echo $file
done

Notes:

  1. If you use double quotes "" with your grep, you will have to escape the pipe like this: \| to search for Dansk, Norsk and Svenska.

  2. Assumes that one line has only one language.

Walkthrough: http://www.cyberciti.biz/faq/howto-use-grep-command-in-linux-unix/

You can use:

grep -l Dansk * | xargs grep -l Norsk | xargs grep -l Svenska

If you want also to find in hidden files:

grep -l Dansk .* | xargs grep -l Norsk | xargs grep -l Svenska
awk '/Dansk/{a=1}/Norsk/{b=1}/Svenska/{c=1}END{ if (a && b && c) print "0" }'

you can then catch the return value with the shell

if you have Ruby(1.9+)

ruby -0777 -ne 'print if /Dansk/ and /Norsk/ and /Svenka/' file

Yet another way using just bash and grep:

For a single file 'test.txt':

grep -q Dansk test.txt && grep -q Norsk test.txt && grep -l Svenska test.txt

Will print test.txt iff the file contains all three (in any combination). The first two greps don't print anything (-q) and the last only prints the file if the other two have passed.

If you want to do it for every file in the directory:

for f in *; do grep -q Dansk $f && grep -q Norsk $f && grep -l Svenska $f; done

Expanding on @kurumi's awk answer, here's a bash function:

all_word_search() {
gawk '
BEGIN {
for (i=ARGC-2; i>=1; i--) {
search_terms[ARGV[i]] = 0;
ARGV[i] = ARGV[i+1];
delete ARGV[i+1];
}
}
{
for (i=1;i<=NF; i++)
if ($i in search_terms)
search_terms[$1] = 1
}
END {
for (word in search_terms)
if (search_terms[word] == 0)
exit 1
}
' "$@"
return $?
}

Usage:

if all_word_search Dansk Norsk Svenska filename; then
echo "all words found"
else
echo "not all words found"
fi

This is a blending of glenn jackman's and kurumi's answers which allows an arbitrary number of regexes instead of an arbitrary number of fixed words or a fixed set of regexes.

#!/usr/bin/awk -f
# by Dennis Williamson - 2011-01-25


BEGIN {
for (i=ARGC-2; i>=1; i--) {
patterns[ARGV[i]] = 0;
delete ARGV[i];
}
}


{
for (p in patterns)
if ($0 ~ p)
matches[p] = 1
# print    # the matching line could be printed
}


END {
for (p in patterns) {
if (matches[p] != 1)
exit 1
}
}

Run it like this:

./multigrep.awk Dansk Norsk Svenska 'Language: .. - A.*c' dvdfile.dat

Here's what worked well for me:

find . -path '*/.svn' -prune -o -type f -exec gawk '/Dansk/{a=1}/Norsk/{b=1}/Svenska/{c=1}END{ if (a && b && c) print FILENAME }' {} \;
./path/to/file1.sh
./another/path/to/file2.txt
./blah/foo.php

If I just wanted to find .sh files with these three, then I could have used:

find . -path '*/.svn' -prune -o -type f -name "*.sh" -exec gawk '/Dansk/{a=1}/Norsk/{b=1}/Svenska/{c=1}END{ if (a && b && c) print FILENAME }' {} \;
./path/to/file1.sh

I did that with two steps. Make a list of csv files in one file With a help of this page comments I made two scriptless steps to get what I needed. Just type into terminal:

$ find /csv/file/dir -name '*.csv' > csv_list.txt
$ grep -q Svenska `cat csv_list.txt` && grep -q Norsk `cat csv_list.txt` && grep -l Dansk `cat csv_list.txt`

it did exactly what I needed - print file names containing all three words.

Also mind the symbols like `' "

grep –irl word1 * | grep –il word2 `cat -` | grep –il word3 `cat -`
  • -i makes search case insensitive
  • -r makes file search recursive through folders
  • -l pipes the list of files with the word found
  • cat - causes the next grep to look through the files passed to it list.

If you only need two search terms, arguably the most readable approach is to run each search and intersect the results:

 comm -12 <(grep -rl word1 . | sort) <(grep -rl word2 . | sort)

You can do this really easily with ack:

ack -l 'cats' | ack -xl 'dogs'
  • -l: return a list of files
  • -x: take the files from STDIN (the previous search) and only search those files

And you can just keep piping until you get just the files you want.

Simply:

grep 'word1\|word2\|word3' *

see this post for more info

I had this problem today, and all one-liners here failed to me because the files contained spaces in the names.

This is what I came up with that worked:

grep -ril <WORD1> | sed 's/.*/"&"/' | xargs grep -il <WORD2>

This searches multiple words in multiple files:

egrep 'abc|xyz' file1 file2 ..filen

If you have git installed

git grep -l --all-match --no-index -e Dansk -e Norsk -e Svenska

The --no-index searches files in the current directory that is not managed by Git. So this command will work in any directory irrespective of whether it is a git repository or not.

A simple one-liner in bash for an arbitrary list LIST for file my_file.txt can be:

LIST="Dansk Norsk Svenska"
EVAL=$(echo "$LIST" | sed 's/[^ ]* */grep -q & my_file.txt \&\& /g'); eval "$EVAL echo yes || echo no"

Replacing eval with echo reveals, that the following command is evaluated:

grep -q Dansk  my_file.txt && grep -q Norsk  my_file.txt && grep -q Svenska my_file.txt &&  echo yes || echo no