Python导入csv到列表

我有一个大约有2000条记录的CSV文件。

每条记录都有一个字符串和一个类别:

This is the first line,Line1
This is the second line,Line2
This is the third line,Line3

我需要把这个文件读入一个列表,看起来像这样:

data = [('This is the first line', 'Line1'),
('This is the second line', 'Line2'),
('This is the third line', 'Line3')]

如何使用Python将此CSV导入到我需要的列表?

807356 次浏览

如果你确定你的输入中没有逗号,而不是分隔类别,你可以在,逐行读取文件分裂,然后将结果推到List

也就是说,它看起来像你在看一个CSV文件,所以你可以考虑使用的模块

result = []
for line in text.splitlines():
result.append(tuple(line.split(",")))

一个简单的循环就足够了:

lines = []
with open('test.txt', 'r') as f:
for line in f.readlines():
l,name = line.strip().split(',')
lines.append((l,name))


print lines

使用csv模块:

import csv


with open('file.csv', newline='') as f:
reader = csv.reader(f)
data = list(reader)


print(data)

输出:

[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]

如果你需要元组:

import csv


with open('file.csv', newline='') as f:
reader = csv.reader(f)
data = [tuple(row) for row in reader]


print(data)

输出:

[('This is the first line', 'Line1'), ('This is the second line', 'Line2'), ('This is the third line', 'Line3')]

旧的Python 2答案,同样使用csv模块:

import csv
with open('file.csv', 'rb') as f:
reader = csv.reader(f)
your_list = list(reader)


print your_list
# [['This is the first line', 'Line1'],
#  ['This is the second line', 'Line2'],
#  ['This is the third line', 'Line3']]

扩展一下您的需求,假设您不关心行顺序,并希望将它们分组到类别下,下面的解决方案可能适合您:

>>> fname = "lines.txt"
>>> from collections import defaultdict
>>> dct = defaultdict(list)
>>> with open(fname) as f:
...     for line in f:
...         text, cat = line.rstrip("\n").split(",", 1)
...         dct[cat].append(text)
...
>>> dct
defaultdict(<type 'list'>, {' CatA': ['This is the first line', 'This is the another line'], ' CatC': ['This is the third line'], ' CatB': ['This is the second line', 'This is the last line']})

通过这种方式,您可以在类别的键下获得字典中所有可用的相关行。

Python 3更新:

import csv


with open('file.csv', newline='') as f:
reader = csv.reader(f)
your_list = list(reader)


print(your_list)

输出:

[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]

熊猫非常擅长处理数据。下面是一个如何使用它的例子:

import pandas as pd


# Read the CSV into a pandas data frame (df)
#   With a df you can do many things
#   most important: visualize data with Seaborn
df = pd.read_csv('filename.csv', delimiter=',')


# Or export it in many ways, e.g. a list of tuples
tuples = [tuple(x) for x in df.values]


# or export it as a list of dicts
dicts = df.to_dict().values()

一个很大的优势是pandas自动处理标题行。

如果你没有听说过Seaborn,我建议你看一看。

参见:如何使用Python读写CSV文件? < / > < / p >

熊猫# 2

import pandas as pd


# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()


# Convert
dicts = df.to_dict('records')

df的内容为:

     country   population population_time    EUR
0    Germany   82521653.0      2016-12-01   True
1     France   66991000.0      2017-01-01   True
2  Indonesia  255461700.0      2017-01-01  False
3    Ireland    4761865.0             NaT   True
4      Spain   46549045.0      2017-06-01   True
5    Vatican          NaN             NaT   True

词典的内容是

[{'country': 'Germany', 'population': 82521653.0, 'population_time': Timestamp('2016-12-01 00:00:00'), 'EUR': True},
{'country': 'France', 'population': 66991000.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': True},
{'country': 'Indonesia', 'population': 255461700.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': False},
{'country': 'Ireland', 'population': 4761865.0, 'population_time': NaT, 'EUR': True},
{'country': 'Spain', 'population': 46549045.0, 'population_time': Timestamp('2017-06-01 00:00:00'), 'EUR': True},
{'country': 'Vatican', 'population': nan, 'population_time': NaT, 'EUR': True}]

熊猫# 3

import pandas as pd


# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()


# Convert
lists = [[row[col] for col in df.columns] for row in df.to_dict('records')]

lists的内容是:

[['Germany', 82521653.0, Timestamp('2016-12-01 00:00:00'), True],
['France', 66991000.0, Timestamp('2017-01-01 00:00:00'), True],
['Indonesia', 255461700.0, Timestamp('2017-01-01 00:00:00'), False],
['Ireland', 4761865.0, NaT, True],
['Spain', 46549045.0, Timestamp('2017-06-01 00:00:00'), True],
['Vatican', nan, NaT, True]]

接下来是一段代码,它使用csv模块,但使用第一行(csv表的头)将file.csv内容提取到dicts列表

import csv
def csv2dicts(filename):
with open(filename, 'rb') as f:
reader = csv.reader(f)
lines = list(reader)
if len(lines) < 2: return None
names = lines[0]
if len(names) < 1: return None
dicts = []
for values in lines[1:]:
if len(values) != len(names): return None
d = {}
for i,_ in enumerate(names):
d[names[i]] = values[i]
dicts.append(d)
return dicts
return None


if __name__ == '__main__':
your_list = csv2dicts('file.csv')
print your_list

Python3的更新:

import csv
from pprint import pprint


with open('text.csv', newline='') as file:
reader = csv.reader(file)
res = list(map(tuple, reader))


pprint(res)

输出:

[('This is the first line', ' Line1'),
('This is the second line', ' Line2'),
('This is the third line', ' Line3')]

如果csvfile是一个文件对象,它应该用newline=''打开 csv模块 < / p >

正如在评论中已经说过的,你可以在python中使用csv库。CSV意味着用逗号分隔的值,这似乎正是您的情况:一个标签和一个用逗号分隔的值。

作为一个类别和值类型,我宁愿使用字典类型而不是元组列表。

无论如何,在下面的代码中我展示了两种方法:d是字典,l是元组列表。

import csv


file_name = "test.txt"
try:
csvfile = open(file_name, 'rt')
except:
print("File not found")
csvReader = csv.reader(csvfile, delimiter=",")
d = dict()
l =  list()
for row in csvReader:
d[row[1]] = row[0]
l.append((row[0], row[1]))
print(d)
print(l)

这是Python 3中最简单的方法。x导入CSV到多维数组,它只有4行代码,没有导入任何东西!

#pull a CSV into a multidimensional array in 4 lines!


L=[]                            #Create an empty list for the main array
for line in open('log.txt'):    #Open the file and read all the lines
x=line.rstrip()             #Strip the \n from each line
L.append(x.split(','))      #Split each line into a list and add it to the
#Multidimensional array
print(L)

不幸的是,我发现现有的答案没有一个特别令人满意。

下面是一个简单而完整的Python 3解决方案,使用csv模块。

import csv


with open('../resources/temp_in.csv', newline='') as f:
reader = csv.reader(f, skipinitialspace=True)
rows = list(reader)


print(rows)

注意skipinitialspace=True参数。这是必要的,因为不幸的是,OP的CSV在每个逗号后都包含空格。

输出:

[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]

可以使用list()函数将csv阅读器对象转换为列表

import csv


with open('input.csv', newline='') as csv_file:
reader = csv.reader(csv_file, delimiter=',')
rows = list(reader)
print(rows)