是否可以使用 sed 可靠地转义正则表达式元字符

小开

最佳答案

注:

如果你正在寻找基于这个答案中讨论的技术的 预先包装的功能:
- 在 这篇文章的底部中可以找到启用 强有力的逃跑的 bash函数 (加上使用 perl的内置支持进行这种转义的 perl解决方案)。
- @ EdMorton 的回答包含一个工具(bash脚本) ，可以强大地执行  单行替换。
 - Ed 的答案现在有 下面使用的 sed命令的 改进版本，在 Calestyo 的回答中更正，如果你想要 使用 其他正则表达式处理工具(如 ABC1和 perl)转义字符串文字，以备将来使用。，就需要 使用 其他正则表达式处理工具(如 ABC1和 perl)转义字符串文字，以备将来使用。。简而言之: 对于跨工具的使用，ABC3必须转义为 ABC4而不是 [\]，意思是: 代替
 下面使用的 sed 's/[^^]/[&]/g; s/\^/\\^/g'命令，必须使用
 sed 's/[^^\]/[&]/g; s/[\^]/\\&/g;'
下面的所有代码片段都假定 bash为 shell (可以进行符合 POSIX 的重新编译) :

单线解决方案

转义字符串文字以便在 `sed`中用作正则表达式:

^{在应该给予信任的地方给予信任: 我在这个答案中发现了下面使用的正则表达式。}

假设搜索字符串是单身行字符串:

search='abc\n\t[a-z]\+\([^ ]\)\{2,3\}\3'  # sample input containing metachars.


searchEscaped=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$search") # escape it.


sed -n "s/$searchEscaped/foo/p" <<<"$search" # Echoes 'foo'

除 ^之外的每个字符都放置在自己的字符集 [...]表达式中，以将其视为文本。
- 请注意，^是一个字符。你的不能表示为 [^]，因为它在那个位置有特殊的意义(否定)。
然后，^字符。转义为 \^。
- 注意，你不能仅仅通过在每个字符前面放一个 \来转义每个字符，因为这样可以把一个字符转换成一个元字符，例如，\<和 \b在某些工具中是单词边界，\n是换行符，\{是像 \{1,3\}这样的 RE 间隔的开始，等等。

这种方法是健壮的，但效率不高。

健壮性来自于 not 尝试预测所有特殊的正则表达式字符-在不同的正则表达式方言中会有所不同-但是对于 只关注所有正则方言共有的两个特征:

在字符集中指定文字字符的能力。
将字面 ^转义为 \^的能力

在 `sed`的 `s///`命令中转义用作替换字符串替换字符串的字符串文字:

sed s///命令中的替换字符串不是正则表达式，但它可以识别 占位符，这些字符串可以是与正则表达式(&)匹配的整个字符串，也可以是与索引(\1，\2，...)匹配的特定捕获组结果，因此这些字符串必须与(惯用的)正则表达式分隔符 /一起转义。

假设替换字符串是单身行字符串:

replace='Laurel & Hardy; PS\2' # sample input containing metachars.


replaceEscaped=$(sed 's/[&/\]/\\&/g' <<<"$replace") # escape it


sed -n "s/.*/$replaceEscaped/p" <<<"foo" # Echoes $replace as-is

多线解决方案

转义用作 `sed`中正则表达式的 MULTI-LINE 字符串文字:

注意 : 这只有在尝试匹配之前读取了 多个输入线路(可能是 ALL)才有意义。
由于诸如 sed和 awk之类的工具在默认情况下一次只能在单身行上运行，因此需要额外的步骤来使它们一次读取多行。

# Define sample multi-line literal.
search='/abc\n\t[a-z]\+\([^ ]\)\{2,3\}\3
/def\n\t[A-Z]\+\([^ ]\)\{3,4\}\4'


# Escape it.
searchEscaped=$(sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$search" | tr -d '\n')           #'


# Use in a Sed command that reads ALL input lines up front.
# If ok, echoes 'foo'
sed -n -e ':a' -e '$!{N;ba' -e '}' -e "s/$searchEscaped/foo/p" <<<"$search"

多行输入字符串中的换行必须翻译成 '\n' 绳子，这是换行在正则表达式中编码的方式。
$!a\'$'\n''\\n'将绳子 '\n'附加到除最后一行之外的每一个输出行(最后一行被忽略，因为它是由 <<<添加的)
然后，tr -d '\n从字符串中删除所有真的换行(sed在打印其模式空间时添加一行) ，有效地将输入中的所有换行替换为 '\n'字符串。

-e ':a' -e '$!{N;ba' -e '}'是 sed惯用语的 POSIX 兼容形式，它循环读取所有输入行，因此留下后续命令一次操作所有输入行。
- 如果你只使用 GNU sed，你可以使用它的 -z选项来简化一次读取所有输入行:
 sed -z "s/$searchEscaped/foo/" <<<"$search"

转义用作 `sed`的 `s///`命令中的替换字符串替换字符串的 MULTI-LINE 字符串文字:

# Define sample multi-line literal.
replace='Laurel & Hardy; PS\2
Masters\1 & Johnson\2'


# Escape it for use as a Sed replacement string.
IFS= read -d '' -r < <(sed -e ':a' -e '$!{N;ba' -e '}' -e 's/[&/\]/\\&/g; s/\n/\\&/g' <<<"$replace")
replaceEscaped=${REPLY%$'\n'}


# If ok, outputs $replace as is.
sed -n "s/\(.*\) \(.*\)/$replaceEscaped/p" <<<"foo bar"

输入字符串中的换行必须保留为实际的换行，但是 \转义。
-e ':a' -e '$!{N;ba' -e '}'是 sed惯用语的 POSIX 兼容形式，它循环读取所有输入行。
与单行解决方案中一样，'s/[&/\]/\\&/g转义所有 &、 \和 /实例。
s/\n/\\&/g'然后 \-所有实际换行的前缀。
IFS= read -d '' -r用于读取 sed命令的输出 保持原样(以避免自动删除指令替代($(...))执行的拖尾换行)。
然后，${REPLY%$'\n'}删除单身后面的换行符，<<<已经隐式地附加到输入中。

基于以上(对于 `sed`) ，`bash`函数 :

在 正则表达式中使用的 quoteRe()引号(转义)
在 s///调用的 替换字符串替换字符串中使用的 quoteSubst()引号。
正确处理 多线路输入
- 请注意，由于 sed在默认情况下一次读取单身行，因此只有在同时显式读取多行(或全部)行的 sed命令中，使用带有多行字符串的 quoteRe()才有意义。
- 另外，使用命令替换($(...))调用函数对于具有跟踪换行符的字符串不起作用; 在这种情况下，使用类似于 IFS= read -d '' -r escapedValue <(quoteSubst "$value")的命令

# SYNOPSIS
#   quoteRe <text>
quoteRe() { sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$1" | tr -d '\n'; }

# SYNOPSIS
#  quoteSubst <text>
quoteSubst() {
IFS= read -d '' -r < <(sed -e ':a' -e '$!{N;ba' -e '}' -e 's/[&/\]/\\&/g; s/\n/\\&/g' <<<"$1")
printf %s "${REPLY%$'\n'}"
}

例如:

from=$'Cost\(*):\n$3.' # sample input containing metachars.
to='You & I'$'\n''eating A\1 sauce.' # sample replacement string with metachars.


# Should print the unmodified value of $to
sed -e ':a' -e '$!{N;ba' -e '}' -e "s/$(quoteRe "$from")/$(quoteSubst "$to")/" <<<"$from"

请注意使用 -e ':a' -e '$!{N;ba' -e '}'一次读取所有输入，这样多行替换就可以工作了。

`perl`解决方案:

Perl 内置支持 转义任意字符串以便在 regex 中使用: 一个 href = “ http://perldoc.perl.org/function/quotemeta.html”rel = “ nofollow noReferrer”> ABC0函数或其等效的 \Q...\E引用。
这种方法对于单行字符串和多行字符串都是相同的; 例如:

from=$'Cost\(*):\n$3.' # sample input containing metachars.
to='You owe me $1/$& for'$'\n''eating A\1 sauce.' # sample replacement string w/ metachars.


# Should print the unmodified value of $to.
# Note that the replacement value needs NO escaping.
perl -s -0777 -pe 's/\Q$from\E/$to/' -- -from="$from" -to="$to" <<<"$from"

请注意使用 -0777一次读取所有输入，这样多行替换就可以工作了。
-s选项允许在脚本之后、任何文件名操作数之前放置 -<var>=<val>样式的 Perl 变量定义。

小开

以此线程中的 @ mklement0的回答为基础，下面的工具将使用 sed和 bash用任何其他单行字符串替换任何单行字符串(相对于 regexp) :

$ cat sedstr
#!/bin/bash
old="$1"
new="$2"
file="${3:--}"
escOld=$(sed 's/[^^\\]/[&]/g; s/\^/\\^/g; s/\\/\\\\/g' <<< "$old")
escNew=$(sed 's/[&/\]/\\&/g' <<< "$new")
sed "s/$escOld/$escNew/g" "$file"

为了说明对这个工具的需要，可以考虑通过直接调用 sed将 a.*/b{2,}\nc替换为 d&e\1f:

$ cat file
a.*/b{2,}\nc
axx/bb\nc


$ sed 's/a.*/b{2,}\nc/d&e\1f/' file
sed: -e expression #1, char 16: unknown option to `s'
$ sed 's/a.*\/b{2,}\nc/d&e\1f/' file
sed: -e expression #1, char 23: invalid reference \1 on `s' command's RHS
$ sed 's/a.*\/b{2,}\nc/d&e\\1f/' file
a.*/b{2,}\nc
axx/bb\nc
# .... and so on, peeling the onion ad nauseum until:
$ sed 's/a\.\*\/b{2,}\\nc/d\&e\\1f/' file
d&e\1f
axx/bb\nc

或使用上述工具:

$ sedstr 'a.*/b{2,}\nc' 'd&e\1f' file
d&e\1f
axx/bb\nc

这种方法之所以有用，是因为如果需要的话，可以很容易地增加使用单词分隔符来替换单词，例如在 GNU sed语法中:

sed "s/\<$escOld\>/$escNew/g" "$file"

而实际操作字符串的工具(例如 awk的 index())则不能使用单词分隔符。

注意: 不用括号表达式包装 \的原因是，如果你使用的工具在括号表达式(例如 perl 和大多数 awk 实现)中接受 [\]]作为字面 ]来完成实际的最终替换(即代替 sed "s/$escOld/$escNew/g") ，那么你不能使用以下方法:

sed 's/[^^]/[&]/g; s/\^/\\^/g'

通过将其封装在 []中来逃避 \，因为这样 \x就会变成 [\][x]，也就是 \ or ] or [ or x:

sed 's/[^^\\]/[&]/g; s/\^/\\^/g; s/\\/\\\\/g'

因此，虽然 [\]对于所有当前的 sed 实现可能都没有问题，但是我们知道 \\将适用于所有 sed、 awk、 perl 等实现，因此使用转义的形式。

小开

应当指出的是，在上述这个和那个中的一些答案中使用的正则表达式:

's/[^^\\]/[&]/g; s/\^/\\^/g; s/\\/\\\\/g'

似乎是错误的:

先执行 s/\^/\\^/g，然后执行 s/\\/\\\\/g是一个错误，因为任何首先转义到 \^的 ^都将再次转义其 \。

更好的办法似乎是: 's/[^\^]/[&]/g; s/[\^]/\\&/g;'。

具有 sed (BRE/ERE)的 [^^\\]应该只是 [^\^](或 [^^\])。\在括号表达式中没有特殊意义，不需要引用。

小开

Bash 参数展开可用于转义用作 Sed 替换字符串的字符串:

# Define a sample multi-line literal. Includes a trailing newline to test corner case
replace='a&b;c\1
d/e
'


# Escape it for use as a Sed replacement string.
: "${replace//\\/\\\\}"
: "${_//&/\\\&}"
: "${_//\//\\\/}"
: "${_//$'\n'/\\$'\n'}"
replaceEscaped=$_


# Output should match "$replace"
sed -n "s/.*/$replaceEscaped/p" <<<''

在 bash 5.2 + 中，它可以进一步简化:

# Define a sample multi-line literal. Includes a trailing newline to test corner case
replace='a&b;c\1
d/e
'


# Escape it for use as a Sed replacement string.
shopt -s extglob
shopt -s patsub_replacement # An & in the replacement will expand to what matched. bash 5.2+
: "${replace//@(&|\\|\/|$'\n')/\\&}"
replaceEscaped=$_


# Output should match "$replace"
sed -n "s/.*/$replaceEscaped/p" <<<''

将其封装在 bash 函数中:

##
# escape_replacement -v var replacement
#
# Escape special characters in _replacement_ so that it can be
# used as the replacement part in a sed substitute command.
# Store the result in _var_.
escape_replacement() {
if ! [[ $# = 3 && $1 = '-v' ]]; then
echo "escape_replacement: invalid usage" >&2
echo "escape_replacement: usage: escape_replacement -v var replacement" >&2
return 1
fi
local -n var=$2 # nameref (requires Bash 4.3+)
# We use the : command (true builtin) as a dummy command as we
# trigger a sequence of parameter expansions
# We exploit that the $_ variable (last argument to the previous command
# after expansion) contains the result of the previous parameter expansion
: "${3//\\/\\\\}" # Backslash-escape any existing backslashes
: "${_//&/\\\&}"  # Backslash-escape &
: "${_//\//\\\/}" # Backslash-escape the delimiter (we assume /)
: "${_//$'\n'/\\$'\n'}" # Backslash-escape newline
var=$_ # Assign to the nameref
# To support Bash older than 4.3, the following can be used instead of nameref
#eval "$2=\$_" # Use eval instead of nameref https://mywiki.wooledge.org/BashFAQ/006
}


# Test the function
# =================


# Define a sample multi-line literal. Include a trailing newline to test corner case
replace='a&b;c\1
d/e
'


escape_replacement -v replaceEscaped "$replace"


# Output should match "$replace"
sed -n "s/.*/$replaceEscaped/p" <<<''

是否可以使用 sed 可靠地转义正则表达式元字符

单线解决方案

转义字符串文字以便在 sed中用作 正则表达式:

在 sed的 s///命令中转义用作 替换字符串替换字符串的字符串文字:

多线解决方案

转义用作 sed中 正则表达式的 MULTI-LINE 字符串文字:

转义用作 sed的 s///命令中的 替换字符串替换字符串的 MULTI-LINE 字符串文字:

基于以上(对于 sed) ，bash函数 :