如何导入一个 csv 文件使用 python 与头完整,其中第一列是一个非数字

这是对前一个问题的详细阐述,但是当我深入研究 python 时,我只是对 python 如何处理 csv 文件感到更加困惑。

我有一个 csv 文件,它必须保持这种状态(例如,不能将其转换为文本文件)。它相当于一个5行乘11列的数组或矩阵或向量。

我一直在尝试使用我在这里找到的各种方法和其他地方(例如 python.org)来读取 csv,以便保留列和行之间的关系,其中第一行和第一列 = 非数值。其余的是浮点值,包含正浮点和负浮点的混合。

我想做的是导入 csv 并在 python 中编译它,这样如果我引用一个列头,它就会返回存储在行中的相关值。例如:

>>> workers, constant, age
>>> workers
w0
w1
w2
w3
constant
7.334
5.235
3.225
0
age
-1.406
-4.936
-1.478
0

等等..。

我正在寻找处理这种数据结构的技术。

296376 次浏览

For Python 3

Remove the rb argument and use either r or don't pass argument (default read mode).

with open( <path-to-file>, 'r' ) as theFile:
reader = csv.DictReader(theFile)
for line in reader:
# line is { 'workers': 'w0', 'constant': 7.334, 'age': -1.406, ... }
# e.g. print( line[ 'workers' ] ) yields 'w0'
print(line)

For Python 2

import csv
with open( <path-to-file>, "rb" ) as theFile:
reader = csv.DictReader( theFile )
for line in reader:
# line is { 'workers': 'w0', 'constant': 7.334, 'age': -1.406, ... }
# e.g. print( line[ 'workers' ] ) yields 'w0'

Python has a powerful built-in CSV handler. In fact, most things are already built in to the standard library.

Python's csv module handles data row-wise, which is the usual way of looking at such data. You seem to want a column-wise approach. Here's one way of doing it.

Assuming your file is named myclone.csv and contains

workers,constant,age
w0,7.334,-1.406
w1,5.235,-4.936
w2,3.2225,-1.478
w3,0,0

this code should give you an idea or two:

>>> import csv
>>> f = open('myclone.csv', 'rb')
>>> reader = csv.reader(f)
>>> headers = next(reader, None)
>>> headers
['workers', 'constant', 'age']
>>> column = {}
>>> for h in headers:
...    column[h] = []
...
>>> column
{'workers': [], 'constant': [], 'age': []}
>>> for row in reader:
...   for h, v in zip(headers, row):
...     column[h].append(v)
...
>>> column
{'workers': ['w0', 'w1', 'w2', 'w3'], 'constant': ['7.334', '5.235', '3.2225', '0'], 'age': ['-1.406', '-4.936', '-1.478', '0']}
>>> column['workers']
['w0', 'w1', 'w2', 'w3']
>>> column['constant']
['7.334', '5.235', '3.2225', '0']
>>> column['age']
['-1.406', '-4.936', '-1.478', '0']
>>>

To get your numeric values into floats, add this

converters = [str.strip] + [float] * (len(headers) - 1)

up front, and do this

for h, v, conv in zip(headers, row, converters):
column[h].append(conv(v))

for each row instead of the similar two lines above.

You can use pandas library and reference the rows and columns like this:

import pandas as pd


input = pd.read_csv("path_to_file");


#for accessing ith row:
input.iloc[i]


#for accessing column named X
input.X


#for accessing ith row and column named X
input.iloc[i].X

I recently had to write this method for quite a large datafile, and i found using list comprehension worked quite well

      import csv
with open("file.csv",'r') as f:
reader = csv.reader(f)
headers = next(reader)
data = [{h:x for (h,x) in zip(headers,row)} for row in reader]
#data now contains a list of the rows, with each row containing a dictionary
#  in the shape {header: value}. If a row terminates early (e.g. there are 12 columns,
#  it only has 11 values) the dictionary will not contain a header value for that row.