如何使用 JavaScript 解析包含逗号的 CSV 字符串?

我有以下类型的字符串

var string = "'string, duppi, du', 23, lala"

我想把每个逗号上的字符串拆分成一个数组,但只能是单引号外的逗号。

我找不出正确的正则表达式来分解..。

string.split(/,/)

会给我

["'string", " duppi", " du'", " 23", " lala"]

但结果应该是:

["string, duppi, du", "23", "lala"]

有没有跨浏览器的解决方案?

170342 次浏览

如果您的引号分隔符是双引号,那么这是 Example JavaScript code to parse CSV data的副本。

你可以先把所有的单引号翻译成双引号:

string = string.replace( /'/g, '"' );

... 或者你可以编辑这个问题中的正则表达式来识别单引号而不是双引号:

// Quoted fields.
"(?:'([^']*(?:''[^']*)*)'|" +

但是,这假定您的问题中没有明确的某些标记。根据我对你的问题的评论,请说明标记的各种可能性。

根据 这篇博文,这个函数应该完成:

String.prototype.splitCSV = function(sep) {
for (var foo = this.split(sep = sep || ","), x = foo.length - 1, tl; x >= 0; x--) {
if (foo[x].replace(/'\s+$/, "'").charAt(foo[x].length - 1) == "'") {
if ((tl = foo[x].replace(/^\s+'/, "'")).length > 1 && tl.charAt(0) == "'") {
foo[x] = foo[x].replace(/^\s*'|'\s*$/g, '').replace(/''/g, "'");
} else if (x) {
foo.splice(x - 1, 2, [foo[x - 1], foo[x]].join(sep));
} else foo = foo.shift().split(sep).concat(foo);
} else foo[x].replace(/''/g, "'");
} return foo;
};

你可以这样称呼它:

var string = "'string, duppi, du', 23, lala";
var parsed = string.splitCSV();
alert(parsed.join("|"));

这种 jsfiddle 类型可以工作,但是有些元素前面似乎有空格。

我的回答是假设您的输入是来自 Web 源的代码/内容的反映,其中单引号和双引号字符是完全可互换的,只要它们作为非转义匹配集出现。

不能使用正则表达式。实际上,您必须编写一个微解析器来分析希望分割的字符串。为了得到这个答案,我将把字符串的引号部分称为子字符串。您需要特别地走过字符串。考虑以下情况:

var a = "some sample string with \"double quotes\" and 'single quotes' and some craziness like this: \\\" or \\'",
b = "sample of code from JavaScript with a regex containing a comma /\,/ that should probably be ignored.";

In this case you have absolutely no idea where a sub-string starts or ends by simply analyzing the input for a character pattern. Instead you have to write logic to make decisions on whether a quote character is used a quote character, is itself unquoted, and that the quote character is not following an escape.

我不打算为您编写那种级别的复杂代码,但是您可以看看我最近编写的具有您需要的模式的代码。这段代码与逗号没有任何关系,但在其他方面是一个足够有效的微解析器,您可以按照它来编写自己的代码。查看下列应用程序的 asifix 功能:

Https://github.com/austincheney/pretty-diff/blob/master/fulljsmin.js

Disclaimer

2014-12-01 Update: The answer below works only for one very specific format of CSV. As correctly pointed out by DG in the comments, this solution does NOT fit the RFC 4180 definition of CSV and it also does NOT fit MS Excel format. This solution simply demonstrates how one can parse one (non-standard) CSV line of input which contains a mix of string types, where the strings may contain escaped quotes and commas.

A non-standard CSV solution

正如 Austincheney 正确地指出的那样,如果您希望正确处理可能包含转义字符的引号字符串,那么您确实需要从头到尾解析字符串。另外,OP 没有明确定义“ CSV 字符串”到底是什么。首先,我们必须定义什么构成一个有效的 CSV 字符串及其各个值。

给定: “ CSV 字符串”定义

为了便于讨论,“ CSV 字符串”由零个或多个值组成,其中多个值用逗号分隔。每个值可包括:

  1. 双引号字符串(可能包含未转义的单引号)
  2. 单引号字符串(可能包含未转义的双引号)
  3. 一个没有引号的字符串。(可能不包含引号、逗号或反斜杠。)
  4. An empty value. (An all whitespace value is considered empty.)

规则/注释:

  • Quoted values may contain commas.
  • 引用的值可以包含转义的任何内容,例如 'that\'s cool'
  • Values containing quotes, commas, or backslashes must be quoted.
  • Values containing leading or trailing whitespace must be quoted.
  • 反斜杠从所有: \'中删除,使用单引号。
  • 反斜杠从所有: \"双引号值中删除。
  • 非引号字符串被裁剪掉任何前导和尾随空格。
  • 逗号分隔符可能有相邻的空格(被忽略)。

查找:

一个 JavaScript 函数,它将一个有效的 CSV 字符串(如上所述)转换为一个字符串值数组。

解决方案:

这个解决方案使用的正则表达式很复杂。而且(恕我直言) 所有非平凡正则表达式应该以自由间隔的方式提出,并有大量的注释和缩进。不幸的是,JavaScript 不允许自由间隔模式。因此,此解决方案实现的正则表达式首先以本机正则表达式语法(使用 Python 的方便的: r'''...'''原始多行字符串语法表示)表示。

首先是一个正则表达式,它验证 CVS 字符串是否满足上述要求:

正则表达式验证“ CSV 字符串”:

re_valid = r"""
# Validate a CSV string having single, double or un-quoted values.
^                                   # Anchor to start of string.
\s*                                 # Allow whitespace before value.
(?:                                 # Group for value alternatives.
'[^'\\]*(?:\\[\S\s][^'\\]*)*'     # Either Single quoted string,
| "[^"\\]*(?:\\[\S\s][^"\\]*)*"     # or Double quoted string,
| [^,'"\s\\]*(?:\s+[^,'"\s\\]+)*    # or Non-comma, non-quote stuff.
)                                   # End group of value alternatives.
\s*                                 # Allow whitespace after value.
(?:                                 # Zero or more additional values
,                                 # Values separated by a comma.
\s*                               # Allow whitespace before value.
(?:                               # Group for value alternatives.
'[^'\\]*(?:\\[\S\s][^'\\]*)*'   # Either Single quoted string,
| "[^"\\]*(?:\\[\S\s][^"\\]*)*"   # or Double quoted string,
| [^,'"\s\\]*(?:\s+[^,'"\s\\]+)*  # or Non-comma, non-quote stuff.
)                                 # End group of value alternatives.
\s*                               # Allow whitespace after value.
)*                                  # Zero or more additional values
$                                   # Anchor to end of string.
"""

如果一个字符串与上面的正则表达式匹配,那么该字符串就是一个有效的 CSV 字符串(根据前面提到的规则) ,并且可以使用下面的正则表达式进行解析。然后使用下面的正则表达式来匹配 CSV 字符串中的一个值。重复应用它,直到找不到更多的匹配项(并且所有值都已解析)。

正则表达式从有效的 CSV 字符串解析一个值:

re_value = r"""
# Match one value in valid CSV string.
(?!\s*$)                            # Don't match empty last value.
\s*                                 # Strip whitespace before value.
(?:                                 # Group for value alternatives.
'([^'\\]*(?:\\[\S\s][^'\\]*)*)'   # Either $1: Single quoted string,
| "([^"\\]*(?:\\[\S\s][^"\\]*)*)"   # or $2: Double quoted string,
| ([^,'"\s\\]*(?:\s+[^,'"\s\\]+)*)  # or $3: Non-comma, non-quote stuff.
)                                   # End group of value alternatives.
\s*                                 # Strip whitespace after value.
(?:,|$)                             # Field ends on comma or EOS.
"""

请注意,有一个特殊情况下的值,这个正则表达式不匹配-最后一个值时,该值为空。这个特殊的 "empty last value"用例由后面的 js 函数进行测试和处理。

解析 CSV 字符串的 JavaScript 函数:

// Return array of string values, or NULL if CSV string not well formed.
function CSVtoArray(text) {
var re_valid = /^\s*(?:'[^'\\]*(?:\\[\S\s][^'\\]*)*'|"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^,'"\s\\]*(?:\s+[^,'"\s\\]+)*)\s*(?:,\s*(?:'[^'\\]*(?:\\[\S\s][^'\\]*)*'|"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^,'"\s\\]*(?:\s+[^,'"\s\\]+)*)\s*)*$/;
var re_value = /(?!\s*$)\s*(?:'([^'\\]*(?:\\[\S\s][^'\\]*)*)'|"([^"\\]*(?:\\[\S\s][^"\\]*)*)"|([^,'"\s\\]*(?:\s+[^,'"\s\\]+)*))\s*(?:,|$)/g;
// Return NULL if input string is not well formed CSV string.
if (!re_valid.test(text)) return null;
var a = [];                     // Initialize array to receive values.
text.replace(re_value, // "Walk" the string using replace with callback.
function(m0, m1, m2, m3) {
// Remove backslash from \' in single quoted values.
if      (m1 !== undefined) a.push(m1.replace(/\\'/g, "'"));
// Remove backslash from \" in double quoted values.
else if (m2 !== undefined) a.push(m2.replace(/\\"/g, '"'));
else if (m3 !== undefined) a.push(m3);
return ''; // Return empty string.
});
// Handle special case of empty last value.
if (/,\s*$/.test(text)) a.push('');
return a;
};

输入和输出示例:

在下面的示例中,使用大括号来分隔 {result strings}。(这有助于可视化前/后空格和零长度字符串。)

// Test 1: Test string from original question.
var test = "'string, duppi, du', 23, lala";
var a = CSVtoArray(test);
/* Array hes 3 elements:
a[0] = {string, duppi, du}
a[1] = {23}
a[2] = {lala} */
// Test 2: Empty CSV string.
var test = "";
var a = CSVtoArray(test);
/* Array hes 0 elements: */
// Test 3: CSV string with two empty values.
var test = ",";
var a = CSVtoArray(test);
/* Array hes 2 elements:
a[0] = {}
a[1] = {} */
// Test 4: Double quoted CSV string having single quoted values.
var test = "'one','two with escaped \' single quote', 'three, with, commas'";
var a = CSVtoArray(test);
/* Array hes 3 elements:
a[0] = {one}
a[1] = {two with escaped ' single quote}
a[2] = {three, with, commas} */
// Test 5: Single quoted CSV string having double quoted values.
var test = '"one","two with escaped \" double quote", "three, with, commas"';
var a = CSVtoArray(test);
/* Array hes 3 elements:
a[0] = {one}
a[1] = {two with escaped " double quote}
a[2] = {three, with, commas} */
// Test 6: CSV string with whitespace in and around empty and non-empty values.
var test = "   one  ,  'two'  ,  , ' four' ,, 'six ', ' seven ' ,  ";
var a = CSVtoArray(test);
/* Array hes 8 elements:
a[0] = {one}
a[1] = {two}
a[2] = {}
a[3] = { four}
a[4] = {}
a[5] = {six }
a[6] = { seven }
a[7] = {} */

附加说明:

这个解决方案要求 CSV 字符串是“有效的”。例如,未加引号的值可能不包含反斜杠或引号,例如下面的 CSV 字符串无效:

var invalid1 = "one, that's me!, escaped \, comma"

这实际上不是一个限制,因为任何子字符串都可以表示为单引号或双引号值。另请注意,这个解决方案只代表一个可能的定义: “逗号分隔值”。

Edit: 2014-05-19: Added disclaimer. 编辑: 2014-12-01: 移动免责声明至顶部。

人们似乎反对正则快递,为什么?

(\s*'[^']+'|\s*[^,]+)(?=,|$)

这是密码,我还做了个 小提琴

String.prototype.splitCSV = function(sep) {
var regex = /(\s*'[^']+'|\s*[^,]+)(?=,|$)/g;
return matches = this.match(regex);
}


var string = "'string, duppi, du', 23, 'string, duppi, du', lala";


console.log( string.splitCSV()  );
.as-console-wrapper { max-height: 100% !important; top: 0; }

http://en.wikipedia.org/wiki/Comma-separated_values处理 RFC4180示例的 PEG (. js)语法:

start
= [\n\r]* first:line rest:([\n\r]+ data:line { return data; })* [\n\r]* { rest.unshift(first); return rest; }


line
= first:field rest:("," text:field { return text; })*
& { return !!first || rest.length; } // ignore blank lines
{ rest.unshift(first); return rest; }


field
= '"' text:char* '"' { return text.join(''); }
/ text:[^\n\r,]* { return text.join(''); }


char
= '"' '"' { return '"'; }
/ [^"]

测试 http://jsfiddle.net/knvzk/10https://pegjs.org/online

https://gist.github.com/3362830下载生成的解析器。

除了优秀和完整的 answer from ridgerunner之外,我还想到了一个非常简单的解决方案,用于后端运行 PHP。

将这个 PHP 文件添加到域的后端(比如: csv.php)

<?php
session_start(); // Optional
header("content-type: text/xml");
header("charset=UTF-8");
// Set the delimiter and the End of Line character of your CSV content:
echo json_encode(array_map('str_getcsv', str_getcsv($_POST["csv"], "\n")));
?>

现在,将这个函数添加到您的 JavaScript 工具包中(我相信应该稍微修改一下,以便能够跨浏览器使用)。

function csvToArray(csv) {
var oXhr = new XMLHttpRequest;
oXhr.addEventListener("readystatechange",
function () {
if (this.readyState == 4 && this.status == 200) {
console.log(this.responseText);
console.log(JSON.parse(this.responseText));
}
}
);
oXhr.open("POST","path/to/csv.php",true);
oXhr.setRequestHeader("Content-type", "application/x-www-form-urlencoded; charset=utf-8");
oXhr.send("csv=" + encodeURIComponent(csv));
}

这将花费您一次 Ajax 调用的成本,但至少您不会复制代码,也不会包含任何外部库。

档号: http://php.net/manual/en/function.str-getcsv.php

While reading the CSV file into a string, it contains null values in between strings, so try it with 0 line by line. It works for me.

stringLine = stringLine.replace(/\0/g, "" );

我有一个非常具体的用例,我想复制细胞从谷歌表到我的网络应用程序。单元格可以包括双引号和新行字符。使用复制和粘贴,单元格由制表符分隔,带有奇数的单元格用双引号引起。我尝试了这个主要解决方案、使用 regexp 的链接文章、 Jquery-CSV 和 CSVToArray。http://papaparse.com/是唯一一个成功的。复制和粘贴是无缝与谷歌表与默认的自动检测选项。

补充 这个答案

如果您需要解析另一个引号转义的引号,例如:

"some ""value"" that is on xlsx file",123

你可以用

function parse(text) {
const csvExp = /(?!\s*$)\s*(?:'([^'\\]*(?:\\[\S\s][^'\\]*)*)'|"([^"\\]*(?:\\[\S\s][^"\\]*)*)"|"([^""]*(?:"[\S\s][^""]*)*)"|([^,'"\s\\]*(?:\s+[^,'"\s\\]+)*))\s*(?:,|$)/g;


const values = [];


text.replace(csvExp, (m0, m1, m2, m3, m4) => {
if (m1 !== undefined) {
values.push(m1.replace(/\\'/g, "'"));
}
else if (m2 !== undefined) {
values.push(m2.replace(/\\"/g, '"'));
}
else if (m3 !== undefined) {
values.push(m3.replace(/""/g, '"'));
}
else if (m4 !== undefined) {
values.push(m4);
}
return '';
});


if (/,\s*$/.test(text)) {
values.push('');
}


return values;
}

我喜欢 FakeRainBrigand 的回答,但是它包含一些问题: 它不能处理引号和逗号之间的空格,并且不支持两个连续的逗号。我试图编辑他的回答,但是我的编辑被评论家拒绝了,他们显然不理解我的代码。这里是我的假雷恩布里甘的代码版本。 There is also a fiddle: http://jsfiddle.net/xTezm/46/

String.prototype.splitCSV = function() {
var matches = this.match(/(\s*"[^"]+"\s*|\s*[^,]+|,)(?=,|$)/g);
for (var n = 0; n < matches.length; ++n) {
matches[n] = matches[n].trim();
if (matches[n] == ',') matches[n] = '';
}
if (this[0] == ',') matches.unshift("");
return matches;
}


var string = ',"string, duppi, du" , 23 ,,, "string, duppi, du",dup,"", , lala';
var parsed = string.splitCSV();
alert(parsed.join('|'));

RFC 4180解决方案

这不能解决问题中的字符串,因为它的格式不符合 RFC4180; 可接受的编码是用双引号转义双引号。下面的解决方案可以正确地处理 google 电子表格中的 CSV 文件 d/l。

最新资料(3/2017)

解析单行代码是错误的。根据 RFC4180字段可能包含 CRLF,这将导致任何行读取器破坏 CSV 文件。下面是解析 CSV 字符串的更新版本:

'use strict';


function csvToArray(text) {
let p = '', row = [''], ret = [row], i = 0, r = 0, s = !0, l;
for (l of text) {
if ('"' === l) {
if (s && l === p) row[i] += l;
s = !s;
} else if (',' === l && s) l = row[++i] = '';
else if ('\n' === l && s) {
if ('\r' === p) row[i] = row[i].slice(0, -1);
row = ret[++r] = [l = '']; i = 0;
} else row[i] += l;
p = l;
}
return ret;
};


let test = '"one","two with escaped """" double quotes""","three, with, commas",four with no quotes,"five with CRLF\r\n"\r\n"2nd line one","two with escaped """" double quotes""","three, with, commas",four with no quotes,"five with CRLF\r\n"';
console.log(csvToArray(test));

老答案

(Single line solution)

function CSVtoArray(text) {
let ret = [''], i = 0, p = '', s = true;
for (let l in text) {
l = text[l];
if ('"' === l) {
s = !s;
if ('"' === p) {
ret[i] += '"';
l = '-';
} else if ('' === p)
l = '-';
} else if (s && ',' === l)
l = ret[++i] = '';
else
ret[i] += l;
p = l;
}
return ret;
}
let test = '"one","two with escaped """" double quotes""","three, with, commas",four with no quotes,five for fun';
console.log(CSVtoArray(test));

有趣的是,下面是如何从数组创建 CSV:

function arrayToCSV(row) {
for (let i in row) {
row[i] = row[i].replace(/"/g, '""');
}
return '"' + row.join('","') + '"';
}


let row = [
"one",
"two with escaped \" double quote",
"three, with, commas",
"four with no quotes (now has)",
"five for fun"
];
let text = arrayToCSV(row);
console.log(text);

我在解析 CSV 文件时也遇到过同样的问题。

该文件包含一个列地址,其中包含“ ,”。

将 CSV 文件解析为 JSON 之后,在将其转换为 JSON 文件时,会得到不匹配的键映射。

我使用 Node.js解析文件和类似于 婴儿语法分析Csvtojson的库。

Example of file -

address,pincode
foo,baar , 123456

当我在 JSON 不使用 baby parse 直接进行解析时,我得到了:

[{
address: 'foo',
pincode: 'baar',
'field3': '123456'
}]

So I wrote code which removes the comma(,) with any other delimiter 每个领域:

/*
csvString(input) = "address, pincode\\nfoo, bar, 123456\\n"
output = "address, pincode\\nfoo {YOUR DELIMITER} bar, 123455\\n"
*/
const removeComma = function(csvString){
let delimiter = '|'
let Baby = require('babyparse')
let arrRow = Baby.parse(csvString).data;
/*
arrRow = [
[ 'address', 'pincode' ],
[ 'foo, bar', '123456']
]
*/
return arrRow.map((singleRow, index) => {
//the data will include
/*
singleRow = [ 'address', 'pincode' ]
*/
return singleRow.map(singleField => {
//for removing the comma in the feild
return singleField.split(',').join(delimiter)
})
}).reduce((acc, value, key) => {
acc = acc +(Array.isArray(value) ?
value.reduce((acc1, val)=> {
acc1 = acc1+ val + ','
return acc1
}, '') : '') + '\n';
return acc;
},'')
}

The function returned can be passed into the csvtojson library and thus the result can be used.

const csv = require('csvtojson')


let csvString = "address, pincode\\nfoo, bar, 123456\\n"
let jsonArray = []
modifiedCsvString = removeComma(csvString)
csv()
.fromString(modifiedCsvString)
.on('json', json => jsonArray.push(json))
.on('end', () => {
/* do any thing with the json Array */
})

现在可以得到如下输出:

[{
address: 'foo, bar',
pincode: 123456
}]

你可以像下面的例子一样使用 Papaparse Js:

<!DOCTYPE html>
<html lang="en">


<head>
<title>CSV</title>
</head>


<body>
<input type="file" id="files" multiple="">
<button onclick="csvGetter()">CSV Getter</button>
<h3>The Result will be in the Console.</h3>


<script src="papaparse.min.js"></script>


<script>
function csvGetter() {


var file = document.getElementById('files').files[0];
Papa.parse(file, {
complete: function(results) {
console.log(results.data);
}
});
}
</script>
</body>


</html>

别忘了把 Papaparse.js 放在同一个文件夹中。

又多了一个,因为我觉得上面所有的都不够“ KISS”。

这个函数使用正则表达式查找逗号或换行符,同时跳过引用的项目。希望这是新手们可以自己读懂的东西。splitFinder regexp 有三个功能(被 |分割) :

  1. ,-查找逗号
  2. \r?\n-找到新的生产线,(如果出口商不错,可能会返回运输)
  3. "(\\"|[^"])*?"-跳过引号中的任何内容,因为逗号和换行符在这里并不重要。如果引用项目中有转义引用 \\",则在找到结束引用之前将捕获它。

const splitFinder = /,|\r?\n|"(\\"|[^"])*?"/g;


function csvTo2dArray(parseMe) {
let currentRow = [];
const rowsOut = [currentRow];
let lastIndex = splitFinder.lastIndex = 0;
  

// add text from lastIndex to before a found newline or comma
const pushCell = (endIndex) => {
endIndex = endIndex || parseMe.length;
const addMe = parseMe.substring(lastIndex, endIndex);
// remove quotes around the item
currentRow.push(addMe.replace(/^"|"$/g, ""));
lastIndex = splitFinder.lastIndex;
}




let regexResp;
// for each regexp match (either comma, newline, or quoted item)
while (regexResp = splitFinder.exec(parseMe)) {
const split = regexResp[0];


// if it's not a quote capture, add an item to the current row
// (quote captures will be pushed by the newline or comma following)
if (split.startsWith(`"`) === false) {
const splitStartIndex = splitFinder.lastIndex - split.length;
pushCell(splitStartIndex);


// then start a new row if newline
const isNewLine = /^\r?\n$/.test(split);
if (isNewLine) { rowsOut.push(currentRow = []); }
}
}
// make sure to add the trailing text (no commas or newlines after)
pushCell();
return rowsOut;
}


const rawCsv = `a,b,c\n"test\r\n","comma, test","\r\n",",",\nsecond,row,ends,with,empty\n"quote\"test"`
const rows = csvTo2dArray(rawCsv);
console.log(rows);

没有 regexp,可读,根据 一个 href = “ https://en.wikipedia.org/wiki/逗号分隔值 # Basic _ rules”rel = “ nofollow norefrer”> https://en.wikipedia.org/wiki/comma-separated_values#basic_rules :

function csv2arr(str: string) {
let line = ["",];
const ret = [line,];
let quote = false;


for (let i = 0; i < str.length; i++) {
const cur = str[i];
const next = str[i + 1];


if (!quote) {
const cellIsEmpty = line[line.length - 1].length === 0;
if (cur === '"' && cellIsEmpty) quote = true;
else if (cur === ",") line.push("");
else if (cur === "\r" && next === "\n") { line = ["",]; ret.push(line); i++; }
else if (cur === "\n" || cur === "\r") { line = ["",]; ret.push(line); }
else line[line.length - 1] += cur;
} else {
if (cur === '"' && next === '"') { line[line.length - 1] += cur; i++; }
else if (cur === '"') quote = false;
else line[line.length - 1] += cur;
}
}
return ret;
}

Regular expressions to the rescue! These few lines of code properly handle quoted fields with embedded commas, quotes, and newlines based on the RFC 4180 standard.

function parseCsv(data, fieldSep, newLine) {
fieldSep = fieldSep || ',';
newLine = newLine || '\n';
var nSep = '\x1D';
var qSep = '\x1E';
var cSep = '\x1F';
var nSepRe = new RegExp(nSep, 'g');
var qSepRe = new RegExp(qSep, 'g');
var cSepRe = new RegExp(cSep, 'g');
var fieldRe = new RegExp('(?<=(^|[' + fieldSep + '\\n]))"(|[\\s\\S]+?(?<![^"]"))"(?=($|[' + fieldSep + '\\n]))', 'g');
var grid = [];
data.replace(/\r/g, '').replace(/\n+$/, '').replace(fieldRe, function(match, p1, p2) {
return p2.replace(/\n/g, nSep).replace(/""/g, qSep).replace(/,/g, cSep);
}).split(/\n/).forEach(function(line) {
var row = line.split(fieldSep).map(function(cell) {
return cell.replace(nSepRe, newLine).replace(qSepRe, '"').replace(cSepRe, ',');
});
grid.push(row);
});
return grid;
}


const csv = 'A1,B1,C1\n"A ""2""","B, 2","C\n2"';
const separator = ',';      // field separator, default: ','
const newline = ' <br /> '; // newline representation in case a field contains newlines, default: '\n'
var grid = parseCsv(csv, separator, newline);
// expected: [ [ 'A1', 'B1', 'C1' ], [ 'A "2"', 'B, 2', 'C <br /> 2' ] ]

除非另有说明,否则不需要有限状态机。正则表达式能够正确地处理 RFC 4180,这要归功于正向后看、负向后看和正向前看。

https://github.com/peterthoeny/parse-csv-js复制/下载代码

我已经使用正则表达式很多次了,但是每次都要重新学习,这让我很沮丧: -)

因此,这里有一个非正则表达式的解决方案:

function csvRowToArray(row, delimiter = ',', quoteChar = '"'){
let nStart = 0, nEnd = 0, a=[], nRowLen=row.length, bQuotedValue;
while (nStart <= nRowLen) {
bQuotedValue = (row.charAt(nStart) === quoteChar);
if (bQuotedValue) {
nStart++;
nEnd = row.indexOf(quoteChar + delimiter, nStart)
} else {
nEnd = row.indexOf(delimiter, nStart)
}
if (nEnd < 0) nEnd = nRowLen;
a.push(row.substring(nStart,nEnd));
nStart = nEnd + delimiter.length + (bQuotedValue ? 1 : 0)
}
return a;
}

工作原理:

  1. row中传递 csv 字符串。
  2. 当下一个值的起始位置在行中时,执行下列操作:
    • 如果引用了此值,请将 nEnd设置为结束引号。
    • 否则,如果没有引用值,则将 nEnd设置为下一个分隔符。
    • 将该值添加到数组中。
    • nStart设置为 nEnd加上分隔符的长度。

有时候编写自己的小函数比使用库更好。您自己的代码将执行得很好,并且只使用很少的内存。此外,您可以很容易地调整它,以适应您自己的需要。

使用 npm 库 csv-string 解析字符串,而不是拆分: https://www.npmjs.com/package/csv-string

这将处理引号和空条目中的逗号

这个问题是基于尼里的答案,只是用了分号:

'use strict';


function csvToArray(text) {
let p = '', row = [''], ret = [row], i = 0, r = 0, s = !0, l;
for (l of text) {
if ('"' === l) {
if (s && l === p) row[i] += l;
s = !s;
} else if (';' === l && s) l = row[++i] = '';
else if ('\n' === l && s) {
if ('\r' === p) row[i] = row[i].slice(0, -1);
row = ret[++r] = [l = '']; i = 0;
} else row[i] += l;
p = l;
}
return ret;
};


let test = '"one";"two with escaped """" double quotes""";"three; with; commas";four with no quotes;"five with CRLF\r\n"\r\n"2nd line one";"two with escaped """" double quotes""";"three, with; commas and semicolons";four with no quotes;"five with CRLF\r\n"';


console.log(csvToArray(test));


试试这个。

function parseCSV(csv) {
let quotes = [];
let token = /(?:(['"`])([\s\S]*?)\1)|([^\t,\r\n]+)\3?|([\r\n])/gm;
let text = csv.replace(/\\?(['"`])\1?/gm, s => s.length != 2 ? s : `_r#${quotes.push(s) - 1}`);
return [...text.matchAll(token)]
.map(t => (t[2] || t[3] || t[4])
.replace(/^_r#\d+$/, "")
.replace(/_r#\d+/g, q => quotes[q.replace(/\D+/, '')][1]))
.reduce((a, b) => /^[\r\n]$/g.test(b)
? a.push([]) && a
: a[a.length - 1].push(b) && a, [[]])
.filter(d => d.length);
}