bash regex with quotes?

The following code

number=1
if [[ $number =~ [0-9] ]]
then
echo matched
fi

works. If I try to use quotes in the regex, however, it stops:

number=1
if [[ $number =~ "[0-9]" ]]
then
echo matched
fi

I tried "\[0-9\]", too. What am I missing?

Funnily enough, bash advanced scripting guide suggests this should work.

Bash version 3.2.39.

39120 次浏览

It was changed between 3.1 and 3.2. Guess the advanced guide needs an update.

This is a terse description of the new features added to bash-3.2 since the release of bash-3.1. As always, the manual page (doc/bash.1) is the place to look for complete descriptions.

  1. New Features in Bash

snip

f. Quoting the string argument to the [[ command's =~ operator now forces string matching, as with the other pattern-matching operators.

Sadly this'll break existing quote using scripts unless you had the insight to store patterns in variables and use them instead of the regexes directly. Example below.

$ bash --version
GNU bash, version 3.2.39(1)-release (i486-pc-linux-gnu)
Copyright (C) 2007 Free Software Foundation, Inc.
$ number=2
$ if [[ $number =~ "[0-9]" ]]; then echo match; fi
$ if [[ $number =~ [0-9] ]]; then echo match; fi
match
$ re="[0-9]"
$ if [[ $number =~ $re ]]; then echo MATCH; fi
MATCH


$ bash --version
GNU bash, version 3.00.0(1)-release (i586-suse-linux)
Copyright (C) 2004 Free Software Foundation, Inc.
$ number=2
$ if [[ $number =~ "[0-9]" ]]; then echo match; fi
match
$ if [[ "$number" =~ [0-9] ]]; then echo match; fi
match

Bash 3.2 introduced a compatibility option compat31 which reverts bash regular expression quoting behavior back to 3.1

Without compat31:

$ shopt -u compat31
$ shopt compat31
compat31        off
$ set -x
$ if [[ "9" =~ "[0-9]" ]]; then echo match; else echo no match; fi
+ [[ 9 =~ \[0-9] ]]
+ echo no match
no match

With compat31:

$ shopt -s compat31
+ shopt -s compat31
$ if [[ "9" =~ "[0-9]" ]]; then echo match; else echo no match; fi
+ [[ 9 =~ [0-9] ]]
+ echo match
match

Link to patch: http://ftp.gnu.org/gnu/bash/bash-3.2-patches/bash32-039

GNU bash, version 4.2.25(1)-release (x86_64-pc-linux-gnu)

Some examples of string match and regex match

    $ if [[ 234 =~ "[0-9]" ]]; then echo matches;  fi # string match
$


$ if [[ 234 =~ [0-9] ]]; then echo matches;  fi # regex natch
matches




$ var="[0-9]"


$ if [[ 234 =~ $var ]]; then echo matches;  fi # regex match
matches




$ if [[ 234 =~ "$var" ]]; then echo matches;  fi # string match after substituting $var as [0-9]


$ if [[ 'rss$var919' =~ "$var" ]]; then echo matches;  fi   # string match after substituting $var as [0-9]


$ if [[ 'rss$var919' =~ $var ]]; then echo matches;  fi # regex match after substituting $var as [0-9]
matches




$ if [[ "rss\$var919" =~ "$var" ]]; then echo matches;  fi # string match won't work


$ if [[ "rss\\$var919" =~ "$var" ]]; then echo matches;  fi # string match won't work




$ if [[ "rss'$var'""919" =~ "$var" ]]; then echo matches;  fi # $var is substituted on LHS & RHS and then string match happens
matches


$ if [[ 'rss$var919' =~ "\$var" ]]; then echo matches;  fi # string match !
matches






$ if [[ 'rss$var919' =~ "$var" ]]; then echo matches;  fi # string match failed
$


$ if [[ 'rss$var919' =~ '$var' ]]; then echo matches;  fi # string match
matches






$ echo $var
[0-9]


$


$ if [[ abc123def =~ "[0-9]" ]]; then echo matches;  fi


$ if [[ abc123def =~ [0-9] ]]; then echo matches;  fi
matches


$ if [[ 'rss$var919' =~ '$var' ]]; then echo matches;  fi # string match due to single quotes on RHS $var matches $var
matches




$ if [[ 'rss$var919' =~ $var ]]; then echo matches;  fi # Regex match
matches
$ if [[ 'rss$var' =~ $var ]]; then echo matches;  fi # Above e.g. really is regex match and not string match
$




$ if [[ 'rss$var919[0-9]' =~ "$var" ]]; then echo matches;  fi # string match RHS substituted and then matched
matches


$ if [[ 'rss$var919' =~ "'$var'" ]]; then echo matches;  fi # trying to string match '$var' fails




$ if [[ '$var' =~ "'$var'" ]]; then echo matches;  fi # string match still fails as single quotes are omitted on RHS


$ if [[ \'$var\' =~ "'$var'" ]]; then echo matches;  fi # this string match works as single quotes are included now on RHS
matches

As mentioned in other answers, putting the regular expression in a variable is a general way to achieve compatibility over different versions. You may also use this workaround to achieve the same thing, while keeping your regular expression within the conditional expression:

$ number=1
$ if [[ $number =~ $(echo "[0-9]") ]]; then echo matched; fi
matched
$

Using a local variable has slightly better performance than using command substitution.

For larger scripts, or collections of scripts, it might make sense to use a utility to prevent unwanted local variables polluting the code, and to reduce verbosity. This seems to work well:

# Bash's built-in regular expression matching requires the regular expression
# to be unqouted (see https://stackoverflow.com/q/218156), which makes it harder
# to use some special characters, e.g., the dollar sign.
# This wrapper works around the issue by using a local variable, which means the
# quotes are not passed on to the regex engine.
regex_match() {
local string regex
string="${1?}"
regex="${2?}"
# shellcheck disable=SC2046 `regex` is deliberately unquoted, see above.
[[ "${string}" =~ ${regex} ]]
}

Example usage:

if regex_match "${number}" '[0-9]'; then
echo matched
fi