JavaScript 中的负向后看等价物

小开

使用

newString = string.replace(/([abcdefg])?m/, function($0,$1){ return $1?$0:'m';});

小开

Mijoja 的策略适用于你的具体情况，但不是一般情况:

js>newString = "Fall ball bill balll llama".replace(/(ba)?ll/g,
function($0,$1){ return $1?$0:"[match]";});
Fa[match] ball bi[match] balll [match]ama

这里有一个示例，目标是匹配一个双 -l，但是如果前面加上“ ba”则不匹配。注意“ ball”这个词——真正的后视应该压制前2个 l，但是匹配第2对。但是，通过匹配前2个 l，然后将该匹配忽略为假阳性，regexp 引擎将从该匹配的结束处继续，并忽略假阳性中的任何字符。

小开

自2018年起，回顾断言成为 ECMAScript 语言规范的一部分。

// positive lookbehind
(?<=...)
// negative lookbehind
(?<!...)

答案是2018年之前

由于 Javascript 支持前景黯淡，一种方法是:

反转输入字符串
与反向正则表达式匹配
反向并重新格式化匹配

const reverse = s => s.split('').reverse().join('');


const test = (stringToTests, reversedRegexp) => stringToTests
.map(reverse)
.forEach((s,i) => {
const match = reversedRegexp.test(s);
console.log(stringToTests[i], match, 'token:', match ? reverse(reversedRegexp.exec(s)[0]) : 'Ø');
});

例子一:

以下是@Andrew-ensley 的问题:

test(['jim', 'm', 'jam'], /m(?!([abcdefg]))/)

产出:

jim true token: m
m true token: m
jam false token: Ø

例二:

以下@neaummusic 注释(匹配 max-height但不匹配 line-height，标记是 height) :

test(['max-height', 'line-height'], /thgieh(?!(-enil))/)

产出:

max-height true token: height
line-height false token: Ø

小开

您可以通过取消字符集来定义非捕获组:

(?:[^a-g])m

... 符合 m 没有中任何一个字母之前的字母。

小开

/(?![abcdefg])[^abcdefg]m/gi 是的，这是一个骗局。

小开

让我们假设您希望找到所有 int之前没有 unsigned 的内容:

支持负面回顾:

(?<!unsigned )int

没有负面回顾的支持:

((?!unsigned ).{9}|^.{0,8})int

基本思想是获取前面的 n 个字符，并使用负向查找排除 match，但也要匹配前面没有 n 个字符的情况。(其中 n 表示向后看的长度)。

因此，我们讨论的正则表达式是:

(?<!([abcdefg]))m

就是说:

((?!([abcdefg])).|^)m

您可能需要使用捕获组来找到您感兴趣的字符串的确切位置，或者您想用其他东西替换特定部分。

小开

根据 Mijoja 的想法，从 JasonS 暴露的问题中，我有了这个想法，我检查了一下，但对自己不太确定，所以一个比我更专业的 js regex 验证将是伟大的:)

var re = /(?=(..|^.?)(ll))/g
// matches empty string position
// whenever this position is followed by
// a string of length equal or inferior (in case of "^")
// to "lookbehind" value
// + actual value we would want to match


,   str = "Fall ball bill balll llama"


,   str_done = str
,   len_difference = 0
,   doer = function (where_in_str, to_replace)
{
str_done = str_done.slice(0, where_in_str + len_difference)
+   "[match]"
+   str_done.slice(where_in_str + len_difference + to_replace.length)


len_difference = str_done.length - str.length
/*  if str smaller:
len_difference will be positive
else will be negative
*/


}   /*  the actual function that would do whatever we want to do
with the matches;
this above is only an example from Jason's */






/*  function input of .replace(),
only there to test the value of $behind
and if negative, call doer() with interesting parameters */
,   checker = function ($match, $behind, $after, $where, $str)
{
if ($behind !== "ba")
doer
(
$where + $behind.length
,   $after
/*  one will choose the interesting arguments
to give to the doer, it's only an example */
)
return $match // empty string anyhow, but well
}
str.replace(re, checker)
console.log(str_done)

我个人的看法是:

Fa[match] ball bi[match] bal[match] [match]ama

原则是在字符串中任意两个字符之间的每个点调用 checker，只要该位置是:

--不需要的大小的任何子字符串(这里是 'ba'，因此是 ..)(如果已知大小; 否则可能更难)

--如果是字符串 ^.?的开头，或者小于这个值

接下来,

--实际上要寻找什么(这里是 'll')。

在每次调用 checker时，都会有一个测试来检查 ll之前的值是否是我们不想要的(!== 'ba') ; 如果是这种情况，我们调用另一个函数，必须是这个函数(doer)对 str 进行更改，如果目的是这个函数，或者更一般地说，将获得必要的输入数据来手动处理 str扫描的结果。

在这里，我们改变字符串，所以我们需要保持一个长度差异的跟踪，以抵消由 replace给出的位置，所有计算在 str上，它本身从不改变。

由于原语字符串是不可变的，我们可以使用变量 str来存储整个操作的结果，但是我认为这个例子已经因为替换而变得复杂了，使用另一个变量(str_done)会更清楚。

我想，就表演而言，这一定是相当严酷的: 所有这些毫无意义的替换“进入”，this str.length-1倍，加上这里手工替换的实干家，这意味着很多切片..。可能在这个特定的上面的情况下，可以分组，通过切割字符串只有一次成片的周围，我们要插入 [match]和 .join()ing 它与 [match]本身。

另一个问题是，我不知道它如何处理更复杂的情况，也就是说，复杂的值用于伪后向... 长度可能是最有问题的数据。

而且，在 checker中，如果 $behind 有多个不需要的值，我们必须用另一个 regex (在 checker之外缓存(创建)是最好的，以避免在每次调用 checker时创建相同的 regex 对象)对它进行测试，以了解它是否是我们想要避免的。

希望我说得够清楚了，如果不犹豫，我会更努力的

小开

这个有效地解决了这个问题

"jim".match(/[^a-g]m/)
> ["im"]
"jam".match(/[^a-g]m/)
> null

搜索和替换示例

"jim jam".replace(/([^a-g])m/g, "$1M")
> "jiM jam"

请注意，负向后看字符串必须为1个字符才能正常工作。

小开

使用你的大小写 如果你想取代 m和一些东西，例如将它转换成大写 M，你可以在捕获组中取消 set。

匹配 ([^a-g])m，替换为 $1M

"jim jam".replace(/([^a-g])m/g, "$1M")
\\jiM jam

([^a-g])将匹配 a-g范围内任何不匹配的字符(^) ，并将其存储在第一个捕获组中，因此您可以使用 $1访问它。

所以我们在 jim中找到 im，然后用 iM代替它，这样就产生了 jiM。

小开

最佳答案

回顾断言在2018年让接受进入了 ECMAScript 规范。

正面回顾用法:

console.log(
"$9.99  €8.47".match(/(?<=\$)\d+\.\d*/) // Matches "9.99"
);

负面回顾使用方法:

console.log(
"$9.99  €8.47".match(/(?<!\$)\d+\.\d*/) // Matches "8.47"
);

平台支援:

Something < a href = “ https://developers.google.com/V8/”rel = “ norefrer”> V8
- Something 谷歌 Chrome 62.0
- Something Microsoft Edge 79.0
- Something Node.js 6.0在标志后面，9.0没有标志
- Something Deno (所有版本)
蜘蛛猴
- Something Mozilla Firefox 78.0
JavaScriptCore: < a href = “ https://bugs.webkit.org/show _ bug.cgi? id = 174931”rel = “ noReferrer”> 苹果正在研究它
- 苹果浏览器
- IOS WebView (iOS + iPadOS 上的所有浏览器)
Something 脉轮: 微软正在为此努力但是脉轮现在被放弃了，转而支持 V8
- Internet Explorer
- Something 79之前的边缘版本(基于 EdgeHTML + Chakra 的版本)

小开

这就是我如何为 Node.js 8实现 str.split(/(?<!^)@/)(它不支持后退) :

str.split('').reverse().join('').split(/@(?!$)/).map(s => s.split('').reverse().join('')).reverse()

工作? 是的(未测试的 Unicode)。不愉快? 是的。

小开

正如前面所提到的，JavaScript 现在允许后视。在老的浏览器中，你仍然需要一个变通方案。

我敢打赌，要想找到一个正则表达式，必须通过后期处理才能准确地传递结果。你所能做的就是和团队合作。假设您有一个正则表达式 (?<!Before)Wanted，其中 Wanted是您想要匹配的正则表达式，而 Before是计算匹配之前不应该匹配的正则表达式的正则表达式。您所能做的最好的事情就是否定正则表达式 Before并使用正则表达式 NotBefore(Wanted)。期望的结果是第一组 $1。

在你的情况下，Before=[abcdefg]很容易否定 NotBefore=[^abcdefg]。所以正则表达式应该是 [^abcdefg](m)。如果需要 Wanted的位置，也必须对 NotBefore进行分组，以便所需的结果是第二组。

如果 Before模式的匹配具有固定长度的 n，也就是说，如果该模式不包含重复标记，那么可以避免否定 Before模式并使用正则表达式 (?!Before).{n}(Wanted)，但仍然必须使用第一组或使用正则表达式 (?!Before)(.{n})(Wanted)并使用第二组。在本例中，模式 Before实际上有一个固定的长度，即1，因此使用正则表达式 (?![abcdefg]).(m)或 (?![abcdefg])(.)(m)。如果您对所有匹配都感兴趣，请添加 g标志，请参阅我的代码片段:

function TestSORegEx() {
var s = "Donald Trump doesn't like jam, but Homer Simpson does.";
var reg = /(?![abcdefg])(.{1})(m)/gm;
var out = "Matches and groups of the regex " +
"/(?![abcdefg])(.{1})(m)/gm in \ns = \"" + s + "\"";
var match = reg.exec(s);
while(match) {
var start = match.index + match[1].length;
out += "\nWhole match: " + match[0] + ", starts at: " + match.index
+  ". Desired match: " + match[2] + ", starts at: " + start + ".";
match = reg.exec(s);
}
out += "\nResulting string after statement s.replace(reg, \"$1*$2*\")\n"
+ s.replace(reg, "$1*$2*");
alert(out);
}