JavaScript中有RegExp.escape函数吗?

小开

最佳答案

在另一个答案中链接的函数是不够的。它不能转义^或$(字符串的开始和结束)，或-，这在字符组中用于范围。

使用这个函数:

function escapeRegex(string) {
return string.replace(/[/\-\\^$*+?.()|[\]{}]/g, '\\$&');
}

虽然乍一看似乎没有必要，但转义-(以及^)使该函数适合转义插入字符类的字符以及正则表达式的主体。

转义/使该函数适合转义在JavaScript正则表达式文字中使用的字符，以供以后计算。

由于转义它们都没有任何缺点，因此转义以覆盖更广泛的用例是有意义的。

是的，这不是标准JavaScript的一部分，这是一个令人失望的失败。

小开

在jQuery UI的自动完成小部件(版本1.9.1)中，他们使用了一个略有不同的正则表达式(第6753行)，下面是正则表达式与bobince的方法的组合。

RegExp.escape = function( value ) {
return value.replace(/[\-\[\]{}()*+?.,\\\^$|#\s]/g, "\\$&");
}

小开

这是一个较短的版本。

RegExp.escape = function(s) {
return s.replace(/[$-\/?[-^{|}]/g, '\\$&');
}

这包括非元字符%、&、'和,，但JavaScript RegExp规范允许这样做。

小开

Mozilla开发者网络正则表达式指南提供了这个转义函数:

function escapeRegExp(string) {
return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string
}

小开

其他答案中的函数对于转义整个正则表达式来说是多余的(它们对于转义稍后将连接到更大的regexp的正则表达式中的部分可能是有用的)。

如果你转义一个完整的regexp并完成它，引用独立的元字符(.， ?， +， *， ^， $， |， \)或开始一些东西((， [， ?0)就是你所需要的:

String.prototype.regexEscape = function regexEscape() {
return this.replace(/[.?+*^$|({[\\]/g, '\\$&');
};

是的，JavaScript没有这样的内置函数是令人失望的。

小开

对于任何使用Lodash的人，自从v3.0.0一个_.escapeRegExp函数是内置的:

_.escapeRegExp('[lodash](https://lodash.com/)');
// → '\[lodash\]\(https:\/\/lodash\.com\/\)'

并且，如果您不想需要完整的Lodash库，则可能需要就是这个函数!

小开

这里的大多数表达式都解决单个特定的用例。

这没关系，但我更喜欢“总是有效”的方法。

function regExpEscape(literal_string) {
return literal_string.replace(/[-[\]{}()*+!<=:?.\/\\^$|#\s,]/g, '\\$&');
}

这将“完全转义”正则表达式中以下任何用法的字面值字符串:

插入正则表达式。例如:# EYZ0
在字符类中的插入。例如:# EYZ0
插入整数计数说明符。例如:# EYZ0
在非javascript正则表达式引擎中执行。

涉及的特殊字符:

-:在字符类中创建字符范围。
[ / ]:开始/结束一个角色类。
{ / }:开始/结束一个数字说明符。
( / ):开始/结束一个组。
* / + / ?:重复类型。
.:匹配任何字符。
\:转义字符，并启动实体。
^:指定匹配区域的开始，并否定字符类中的匹配。
$:指定匹配区域的结束。
|:指定替换。
#:以自由行距模式指定注释。
\s:在自由间距模式下被忽略。
,:分隔数值说明符中的值。
/:开始或结束表达式。
::完成特殊的组类型和部分perl风格的字符类。
!:否定零宽度组。
< / =:零宽度组规格的一部分。

注:

/在任何类型的正则表达式中都不是严格必要的。然而，如果有人(颤抖)做了eval("/" + pattern + "/");，它会保护你。
,确保如果字符串在数值说明符中是整数，它将正确地导致RegExp编译错误，而不是无声地编译错误。
#和\s在JavaScript中不需要转义，但在许多其他类型中需要转义。它们在这里转义，以防稍后将正则表达式传递给另一个程序。

如果你还需要对JavaScript正则表达式引擎的潜在功能进行未来验证，我建议使用更偏执狂的方法:

function regExpEscapeFuture(literal_string) {
return literal_string.replace(/[^A-Za-z0-9_]/g, '\\$&');
}

该函数转义所有字符，除了那些明确保证不会在将来的正则表达式中用于语法的字符。

对于真正热衷于卫生的人来说，考虑一下这个边缘情况:

var s = '';
new RegExp('(choice1|choice2|' + regExpEscape(s) + ')');

这个应该在JavaScript中编译很好，但在其他一些风格中就不行。如果打算传递给另一个flavor，应该单独检查s === ''的空大小写，如下所示:

var s = '';
new RegExp('(choice1|choice2' + (s ? '|' + regExpEscape(s) : '') + ')');

小开

在https://github.com/benjamingr/RexExp.escape/有一个ES7 RegExp.escape建议，在https://github.com/ljharb/regexp.escape有一个polyfill可用。

小开

escapeRegExp = function(str) {
if (str == null) return '';
return String(str).replace(/([.*+?^=!:${}()|[\]\/\\])/g, '\\$1');
};

小开

没有什么可以阻止你转义每个非字母数字字符:

usersString.replace(/(?=\W)/g, '\\');

在执行re.toString()时，您会损失一定程度的可读性，但您获得了极大的简单性(和安全性)。

根据ECMA-262，一方面，正则表达式“语法字符”总是非字母数字，这样的结果是安全的，特殊的转义序列(\d， \w， \n)总是字母数字，这样就不会产生错误的控制转义。

小开

XRegExp有一个转义函数:

< p > <代码> XRegExp.escape(“逃?& lt; .>”); / /→“逃\ ?\ & lt; \ .>” < /代码> < / p >

更多信息:http://xregexp.com/api/#escape

小开

与其只转义字符，否则会导致正则表达式中的问题(例如:黑名单)，不如考虑使用白名单。这样每个字符都被认为是有污点的，除非它匹配。

对于本例，假设有以下表达式:

RegExp.escape('be || ! be');

白名单包括字母、数字和空格:

RegExp.escape = function (string) {
return string.replace(/([^\w\d\s])/gi, '\\$1');
}

返回:

"be \|\| \! be"

这可能会转义不需要的字符，但这不会妨碍您的表达(可能会有一些小的时间损失-但为了安全起见，这是值得的)。

小开

另一种(更安全的)方法是使用unicode转义格式\u{code}转义所有字符(而不仅仅是我们目前知道的一些特殊字符):

function escapeRegExp(text) {
return Array.from(text)
.map(char => `\\u{${char.charCodeAt(0).toString(16)}}`)
.join('');
}


console.log(escapeRegExp('a.b')); // '\u{61}\u{2e}\u{62}'

请注意，你需要传递u标志来让这个方法工作:

var expression = new RegExp(escapeRegExp(usersString), 'u');

小开

只有12个元字符需要转义

.

对转义字符串做什么并不重要，插入到平衡的正则表达式包装器或追加。没关系。

字符串替换使用这个吗

var escaped_string = oldstring.replace(/[\\^$.|?*+()[{]/g, '\\$&');

小开

在https://github.com/benjamingr/RexExp.escape/有一个关于RegExp.escape的ES7提议，在https://github.com/ljharb/regexp.escape有一个polyfill可用。

一个基于被拒绝的ES提案的例子，包括检查属性是否已经存在，以防TC39改变他们的决定。

代码:

if (!Object.prototype.hasOwnProperty.call(RegExp, 'escape')) {
RegExp.escape = function(string) {
// https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#Escaping
// https://github.com/benjamingr/RegExp.escape/issues/37
return string.replace(/[.*+\-?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string
};
}

代码简化:

Object.prototype.hasOwnProperty.call(RegExp,"escape")||(RegExp.escape=function(e){return e.replace(/[.*+\-?^${}()|[\]\\]/g,"\\$&")});

// ...
var assert = require('assert');
 

var str = 'hello. how are you?';
var regex = new RegExp(RegExp.escape(str), 'g');
assert.equal(String(regex), '/hello\. how are you\?/g');

还有一个npm模块在: # EYZ0 < / p >

你可以安装这个并这样使用它:

npm install regexp.escape

或

yarn add regexp.escape

var escape = require('regexp.escape');
var assert = require('assert');
 

var str = 'hello. how are you?';
var regex = new RegExp(escape(str), 'g');
assert.equal(String(regex), '/hello\. how are you\?/g');

在GitHub &&NPM页面描述了如何为这个选项使用垫片/填充。该逻辑基于return RegExp.escape || implementation;，其中的实现包含上面使用的regexp。

NPM模块是一个额外的依赖项，但它也使外部贡献者更容易识别添加到代码中的逻辑部分。¯\ # EYZ0 /¯

小开

我借用了bobince的以上回答，并创建了带标签的模板函数，用于创建RegExp，其中部分值被转义，部分值未被转义。

regex-escaped.js

RegExp.escape = text => text.replace(/[\-\[\]{}()*+?.,\\\^$|#\s]/g, '\\$&');


RegExp.escaped = flags =>
function (regexStrings, ...escaped) {
const source = regexStrings
.map((s, i) =>
// escaped[i] will be undefined for the last value of s
escaped[i] === undefined
? s
: s + RegExp.escape(escaped[i].toString())
)
.join('');
return new RegExp(source, flags);
};
  

function capitalizeFirstUserInputCaseInsensitiveMatch(text, userInput) {
const [, before, match, after ] =
RegExp.escaped('i')`^((?:(?!${userInput}).)*)(${userInput})?(.*)$`.exec(text);


return `${before}${match.toUpperCase()}${after}`;
}


const text = 'hello (world)';
const userInput = 'lo (wor';
console.log(capitalizeFirstUserInputCaseInsensitiveMatch(text, userInput));

对于TypeScript的粉丝们…

global.d.ts

interface RegExpConstructor {
/** Escapes a string so that it can be used as a literal within a `RegExp`. */
escape(text: string): string;


/**
* Returns a tagged template function that creates `RegExp` with its template values escaped.
*
* This can be useful when using a `RegExp` to search with user input.
*
* @param flags The flags to apply to the `RegExp`.
*
* @example
*
* function capitalizeFirstUserInputCaseInsensitiveMatch(text: string, userInput: string) {
*   const [, before, match, after ] =
*     RegExp.escaped('i')`^((?:(?!${userInput}).)*)(${userInput})?(.*)$`.exec(text);
*
*   return `${before}${match.toUpperCase()}${after}`;
* }
*/
escaped(flags?: string): (regexStrings: TemplateStringsArray, ...escapedVals: Array<string | number>) => RegExp;
}

小开

这是长久之计。

function regExpEscapeFuture(literal_string) {
return literal_string.replace(/[^A-Za-z0-9_]/g, '\\$&');
}

小开

刚刚发布了一个基于RegExp.escape垫片的正则表达式转义要点，而RegExp.escape垫片又基于拒绝RegExp.escape提议。看起来大致相当于接受的答案，除了它没有转义-字符，根据我的手动测试，这似乎实际上很好。

撰写本文时的主要内容:

const syntaxChars = /[\^$\\.*+?()[\]{}|]/g


/**
* Escapes all special special regex characters in a given string
* so that it can be passed to `new RegExp(escaped, ...)` to match all given
* characters literally.
*
* inspired by https://github.com/es-shims/regexp.escape/blob/master/implementation.js
*
* @param {string} s
*/
export function escape(s) {
return s.replace(syntaxChars, '\\$&')
}