从 JavaScript 字符串中删除零宽空格字符

我获取用户输入(JS 代码)并实时执行(处理)它们以显示一些输出。

有时候代码有零宽度的空格,这真的很奇怪。我不知道用户是怎么输入的。例子: "(​$".length === 3

我需要能够从我的 JS 代码中删除这个字符。我该怎么做?或者也许有其他的方法来执行 JS 代码,这样浏览器就不会考虑零宽空格字符了?

57265 次浏览
[].filter.call( str, function( c ) {
return c.charCodeAt( 0 ) !== 8203;
} );

Filter each character to remove the 8203 char code (zero-width space unicode number).

Unicode has the following zero-width characters:

  • U+200B zero width space
  • U+200C zero width non-joiner Unicode code point
  • U+200D zero width joiner Unicode code point
  • U+FEFF zero width no-break space Unicode code point

To remove them from a string in JavaScript, you can use a simple regular expression:

var userInput = 'a\u200Bb\u200Cc\u200Dd\uFEFFe';
console.log(userInput.length); // 9
var result = userInput.replace(/[\u200B-\u200D\uFEFF]/g, '');
console.log(result.length); // 5

Note that there are many more symbols that may not be visible. Some of ASCII’s control characters, for example.

I had a problem some invisible characters were corrupting my JSON and causing Unexpected Token ILLEGAL exception which was crashing my site.

Here is my solution using RegExp variable:

    var re = new RegExp("\u2028|\u2029");
var result = text.replace(re, '');

More about Javascript and zero width spaces you can find here: Zero Width Spaces

str.replace(/\u200B/g,'');

200B is the hexadecimal of the zero width space 8203. replace this with empty string to remove this

If you are trying to do this in JavaScript, try this regex.

/([\u200B]+|[\u200C]+|[\u200D]+|[\u200E]+|[\u200F]+|[\uFEFF]+)/g

submit.onclick = evt => {
const stringToTrim = stringValue.value;
zeroWidthTrim(stringToTrim);
}


/**
* Given a string, when it has zero-width spaces in it, then remove them
*
* @param {String} stringToTrim The string to be trimmed of unicode spaces
*
* @return the trimmed string
*
* Regex for zero-width space Unicode characters.
*
* U+200B zero-width space.
* U+200C zero-width non-joiner.
* U+200D zero-width joiner.
* U+200E left-to-right mark.
* U+200F right-to-left mark.
* U+FEFF zero-width non-breaking space.
*/
function zeroWidthTrim(stringToTrim) {
const ZERO_WIDTH_SPACES_REGEX = /([\u200B]+|[\u200C]+|[\u200D]+|[\u200E]+|[\u200F]+|[\uFEFF]+)/g;
console.log('stringToTrim = ' + stringToTrim);
const trimmedString = stringToTrim.replace(ZERO_WIDTH_SPACES_REGEX, '');
console.log('trimmedString = ' + trimmedString);
return trimmedString;
};
<form runat="server">
<input name="stringValue" id="stringValue" type="text" placeholder="enter your string" value="[&#x200b;&#x200c;]" />
<input type="button" value="remove zero-width characters" id="submit" />
</form>

(Once you have run the above code snippet, paste the stringToTrim value and the trimmedString value into the regex101 test window and you will see that the Unicode character has gone from the trimmedString value.)