JS 正则表达式按行拆分

如何将一段长文本分割成单独的行? 为什么返回 第一行两次?

/^(.*?)$/mg.exec('line1\r\nline2\r\n');

[“ line 1”,“ line 1”]

我打开多行修饰符,使 ^$匹配行的开始和结束。我还打开了全局修饰符来捕获 所有行。

我希望使用正则表达式分割而不是 String.split,因为我将同时处理 Linux \n和 Windows \r\n行结束符。

113427 次浏览

First replace all \r\n with \n, then String.split.

Use

result = subject.split(/\r?\n/);

Your regex returns line1 twice because line1 is both the entire match and the contents of the first capturing group.

arrayOfLines = lineString.match(/[^\r\n]+/g);

As Tim said, it is both the entire match and capture. It appears regex.exec(string) returns on finding the first match regardless of global modifier, wheras string.match(regex) is honouring global.

I am assuming following constitute newlines

  1. \r followed by \n
  2. \n followed by \r
  3. \n present alone
  4. \r present alone

Please Use

var re=/\r\n|\n\r|\n|\r/g;


arrayofLines=lineString.replace(re,"\n").split("\n");

for an array of all Lines including the empty ones.

OR

Please Use

arrayOfLines = lineString.match(/[^\r\n]+/g);

For an array of non empty Lines

http://jsfiddle.net/uq55en5o/

var lines = text.match(/^.*((\r\n|\n|\r)|$)/gm);

I have done something like this. Above link is my fiddle.

Even simpler regex that handles all line ending combinations, even mixed in the same file, and removes empty lines as well:

var lines = text.split(/[\r\n]+/g);

With whitespace trimming:

var lines = text.trim().split(/\s*[\r\n]+\s*/g);

Unicode Compliant Line Splitting

Unicode® Technical Standard #18 defines what constitutes line boundaries. That same section also gives a regular expression to match all line boundaries. Using that regex, we can define the following JS function that splits a given string at any line boundary (preserving empty lines as well as leading and trailing whitespace):

const splitLines = s => s.split(/\r\n|(?!\r\n)[\n-\r\x85\u2028\u2029]/)

I don't understand why the negative look-ahead part ((?!\r\n)) is necessary, but that is what is suggested in the Unicode document 🤷‍♂️.

The above document recommends to define a regular expression meta-character for matching all line ending characters and sequences. Perl has \R for that. Unfortunately, JavaScript does not include such a meta-character. Alas, I could not even find a TC39 proposal for that.