使用“ |”替代运算符进行 greping

下面是一个名为 AT5G60410.gff 的大文件示例:

Chr5    TAIR10  gene    24294890    24301147    .   +   .   ID=AT5G60410;Note=protein_coding_gene;Name=AT5G60410
Chr5    TAIR10  mRNA    24294890    24301147    .   +   .   ID=AT5G60410.1;Parent=AT5G60410;Name=AT5G60410.1;Index=1
Chr5    TAIR10  protein 24295226    24300671    .   +   .   ID=AT5G60410.1-Protein;Name=AT5G60410.1;Derives_from=AT5G60410.1
Chr5    TAIR10  exon    24294890    24295035    .   +   .   Parent=AT5G60410.1
Chr5    TAIR10  five_prime_UTR  24294890    24295035    .   +   .   Parent=AT5G60410.1
Chr5    TAIR10  exon    24295134    24295249    .   +   .   Parent=AT5G60410.1
Chr5    TAIR10  five_prime_UTR  24295134    24295225    .   +   .   Parent=AT5G60410.1
Chr5    TAIR10  CDS 24295226    24295249    .   +   0   Parent=AT5G60410.1,AT5G60410.1-Protein;
Chr5    TAIR10  exon    24295518    24295598    .   +   .   Parent=AT5G60410.1

我在使用 grep 从中提取特定行时遇到了一些麻烦。我想提取第三列中指定的“基因”或“外显子”类型的所有行。当这种做法没有奏效时,我感到惊讶:

grep 'gene|exon' AT5G60410.gff

没有结果返回,我哪里做错了?

65928 次浏览

You need to escape the |. The following should do the job.

grep "gene\|exon" AT5G60410.gff

By default, grep treats the typical special characters as normal characters unless they are escaped. So you could use the following:

grep 'gene\|exon' AT5G60410.gff

However, you can change its mode by using the following forms to do what you are expecting:

egrep 'gene|exon' AT5G60410.gff
grep -E 'gene|exon' AT5G60410.gff

This is a different way of grepping for a few choices:

grep -e gene -e exon AT5G60410.gff

the -e switch specifies different patterns to match.

This will work:

grep "gene\|exon" AT5G60410.gff

I found this question while googling for a particular problem I was having involving a piped command to a grep command that used the alternation operator in a regex, so I thought that I would contribute my more specialized answer.

The error I faced turned out to be with the previous pipe operator (i.e. |) and not the alternation operator (i.e. | identical to pipe operator) in the grep regex at all. The answer for me was to properly escape and quote as necessary special shell characters such as & before assuming the issue was with my grep regex that involved the alternation operator.

For example, the command I executed on my local machine was:

get http://localhost/foobar-& | grep "fizz\|buzz"

This command resulted in the following error:

-bash: syntax error near unexpected token `|'

This error was corrected by changing my command to:

get "http://localhost/foobar-&" | grep "fizz\|buzz"

By escaping the & character with double quotes I was able to resolve my issue. The answer had nothing to do with the alternation operation at all.