How to convert CSV file to multiline JSON?

这是我的代码,非常简单的东西..。

import csv
import json


csvfile = open('file.csv', 'r')
jsonfile = open('file.json', 'w')


fieldnames = ("FirstName","LastName","IDNumber","Message")
reader = csv.DictReader( csvfile, fieldnames)
out = json.dumps( [ row for row in reader ] )
jsonfile.write(out)

声明一些字段名,读取器使用 CSV 读取文件,并使用文件名将文件转储为 JSON 格式。问题是..。

CSV 文件中的每个记录位于不同的行上。我希望 JSON 输出也是这样。问题是它把所有的都扔在一个巨大的,长长的线上。

我试过使用类似于 for line in csvfile:的代码,然后用 reader = csv.DictReader( line, fieldnames)运行下面的代码,它循环遍历每一行,但是它在一行上循环整个文件,然后在另一行上循环遍历整个文件... ... 继续下去,直到没有行了。

Any suggestions for correcting this?

编辑: 澄清一下,目前我有: (第1行的每个记录)

[{"FirstName":"John","LastName":"Doe","IDNumber":"123","Message":"None"},{"FirstName":"George","LastName":"Washington","IDNumber":"001","Message":"Something"}]

我要找的: (2行2张唱片)

{"FirstName":"John","LastName":"Doe","IDNumber":"123","Message":"None"}
{"FirstName":"George","LastName":"Washington","IDNumber":"001","Message":"Something"}

不是每个字段在单独的行上缩进,而是每个记录在它自己的行上。

一些样本输入。

"John","Doe","001","Message1"
"George","Washington","002","Message2"
294123 次浏览

indent参数添加到 json.dumps

 data = {'this': ['has', 'some', 'things'],
'in': {'it': 'with', 'some': 'more'}}
print(json.dumps(data, indent=4))

还要注意,您可以简单地使用 json.dump和打开的 jsonfile:

json.dump(data, jsonfile)
import csv
import json
csvfile = csv.DictReader('filename.csv', 'r'))
output =[]
for each in csvfile:
row ={}
row['FirstName'] = each['FirstName']
row['LastName']  = each['LastName']
row['IDNumber']  = each ['IDNumber']
row['Message']   = each['Message']
output.append(row)
json.dump(output,open('filename.json','w'),indent=4,sort_keys=False)

所需输出的问题在于它不是有效的 json 文档; 它是一个 stream of json documents

没关系,如果它是您需要的,但是这意味着对于输出中需要的每个文档,您必须调用 json.dumps

因为您希望分隔文档的换行不包含在这些文档中,所以您必须自己提供它。因此,我们只需要将循环从 json.dump 调用中提取出来,并为编写的每个文档插入换行符。

import csv
import json


csvfile = open('file.csv', 'r')
jsonfile = open('file.json', 'w')


fieldnames = ("FirstName","LastName","IDNumber","Message")
reader = csv.DictReader( csvfile, fieldnames)
for row in reader:
json.dump(row, jsonfile)
jsonfile.write('\n')

作为@MONTYHS 回答的一个小小改进,迭代了一堆字段名:

import csv
import json


csvfilename = 'filename.csv'
jsonfilename = csvfilename.split('.')[0] + '.json'
csvfile = open(csvfilename, 'r')
jsonfile = open(jsonfilename, 'w')
reader = csv.DictReader(csvfile)


fieldnames = ('FirstName', 'LastName', 'IDNumber', 'Message')


output = []


for each in reader:
row = {}
for field in fieldnames:
row[field] = each[field]
output.append(row)


json.dump(output, jsonfile, indent=2, sort_keys=True)

你可以试试 这个

import csvmapper


# how does the object look
mapper = csvmapper.DictMapper([
[
{ 'name' : 'FirstName'},
{ 'name' : 'LastName' },
{ 'name' : 'IDNumber', 'type':'int' },
{ 'name' : 'Messages' }
]
])


# parser instance
parser = csvmapper.CSVParser('sample.csv', mapper)
# conversion service
converter = csvmapper.JSONConverter(parser)


print converter.doConvert(pretty=True)

编辑:

更简单的方法

import csvmapper


fields = ('FirstName', 'LastName', 'IDNumber', 'Messages')
parser = CSVParser('sample.csv', csvmapper.FieldMapper(fields))


converter = csvmapper.JSONConverter(parser)


print converter.doConvert(pretty=True)

我采用了@SingleNegation 厨  办法,并把它简化为一个可以在管道中使用的三行程序:

import csv
import json
import sys


for row in csv.DictReader(sys.stdin):
json.dump(row, sys.stdout)
sys.stdout.write('\n')

使用 Pandas 将 csv 文件读入 DataFrame (pd.read_csv) ,然后根据需要操作列(删除它们或更新值) ,最后将 DataFrame 转换回 JSON (Pdf. DataFrame.to _ json) ,这样做如何。

注意: 我还没有检查它的效率有多高,但是这绝对是操作大型 csv 并将其转换为 json 的最简单的方法之一。

I see this is old but I needed the code from SingleNegationElimination however I had issue with the data containing non utf-8 characters. These appeared in fields I was not overly concerned with so I chose to ignore them. However that took some effort. I am new to python so with some trial and error I got it to work. The code is a copy of SingleNegationElimination with the extra handling of utf-8. I tried to do it with https://docs.python.org/2.7/library/csv.html but in the end gave up. The below code worked.

import csv, json


csvfile = open('file.csv', 'r')
jsonfile = open('file.json', 'w')


fieldnames = ("Scope","Comment","OOS Code","In RMF","Code","Status","Name","Sub Code","CAT","LOB","Description","Owner","Manager","Platform Owner")
reader = csv.DictReader(csvfile , fieldnames)


code = ''
for row in reader:
try:
print('+' + row['Code'])
for key in row:
row[key] = row[key].decode('utf-8', 'ignore').encode('utf-8')
json.dump(row, jsonfile)
jsonfile.write('\n')
except:
print('-' + row['Code'])
raise

您可以使用熊猫数据框架来实现这一点,例如:

import pandas as pd
csv_file = pd.DataFrame(pd.read_csv("path/to/file.csv", sep = ",", header = 0, index_col = False))
csv_file.to_json("/path/to/new/file.json", orient = "records", date_format = "epoch", double_precision = 10, force_ascii = True, date_unit = "ms", default_handler = None)
import csv
import json


file = 'csv_file_name.csv'
json_file = 'output_file_name.json'


#Read CSV File
def read_CSV(file, json_file):
csv_rows = []
with open(file) as csvfile:
reader = csv.DictReader(csvfile)
field = reader.fieldnames
for row in reader:
csv_rows.extend([{field[i]:row[field[i]] for i in range(len(field))}])
convert_write_json(csv_rows, json_file)


#Convert csv data into json
def convert_write_json(data, json_file):
with open(json_file, "w") as f:
f.write(json.dumps(data, sort_keys=False, indent=4, separators=(',', ': '))) #for pretty
f.write(json.dumps(data))




read_CSV(file,json_file)

Json.dump ()的文档

def read():
noOfElem = 200  # no of data you want to import
csv_file_name = "hashtag_donaldtrump.csv"  # csv file name
json_file_name = "hashtag_donaldtrump.json"  # json file name


with open(csv_file_name, mode='r') as csv_file:
csv_reader = csv.DictReader(csv_file)
with open(json_file_name, 'w') as json_file:
i = 0
json_file.write("[")
            

for row in csv_reader:
i = i + 1
if i == noOfElem:
json_file.write("]")
return


json_file.write(json.dumps(row))


if i != noOfElem - 1:
json_file.write(",")


改变以上三个参数,一切都将完成。

使用熊猫和 json 图书馆:

import pandas as pd
import json
filepath = "inputfile.csv"
output_path = "outputfile.json"


df = pd.read_csv(filepath)


# Create a multiline json
json_list = json.loads(df.to_json(orient = "records"))


with open(output_path, 'w') as f:
for item in json_list:
f.write("%s\n" % item)