导入 CSV 文件作为熊猫数据框架

小开

最佳答案

import pandas as pd
df = pd.read_csv("data.csv")
print(df)

This outputs a pandas DataFrame:

        Date    price  factor_1  factor_2
0  2012-06-11  1600.20     1.255     1.548
1  2012-06-12  1610.02     1.258     1.554
2  2012-06-13  1618.07     1.249     1.552
3  2012-06-14  1624.40     1.253     1.556
4  2012-06-15  1626.15     1.258     1.552
5  2012-06-16  1626.15     1.263     1.558
6  2012-06-17  1626.15     1.264     1.572

小开

您可以使用 python 标准库中的 CSV 模块来操作 CSV 文件。

例如:

import csv
with open('some.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
print row

小开

注意:

import csv


with open("value.txt", "r") as f:
csv_reader = reader(f)
num = '  '
for row in csv_reader:
print num, '\t'.join(row)
if num == '  ':
num=0
num=num+1

虽然不那么紧凑，但还是起到了作用:

   Date price   factor_1    factor_2
1 2012-06-11    1600.20 1.255   1.548
2 2012-06-12    1610.02 1.258   1.554
3 2012-06-13    1618.07 1.249   1.552
4 2012-06-14    1624.40 1.253   1.556
5 2012-06-15    1626.15 1.258   1.552
6 2012-06-16    1626.15 1.263   1.558
7 2012-06-17    1626.15 1.264   1.572

小开

Here's an alternative to pandas library using Python's built-in csv module.

import csv
from pprint import pprint
with open('foo.csv', 'rb') as f:
reader = csv.reader(f)
headers = reader.next()
column = {h:[] for h in headers}
for row in reader:
for h, v in zip(headers, row):
column[h].append(v)
pprint(column)    # Pretty printer

将打印

{'Date': ['2012-06-11',
'2012-06-12',
'2012-06-13',
'2012-06-14',
'2012-06-15',
'2012-06-16',
'2012-06-17'],
'factor_1': ['1.255', '1.258', '1.249', '1.253', '1.258', '1.263', '1.264'],
'factor_2': ['1.548', '1.554', '1.552', '1.556', '1.552', '1.558', '1.572'],
'price': ['1600.20',
'1610.02',
'1618.07',
'1624.40',
'1626.15',
'1626.15',
'1626.15']}

小开

To read a CSV file as a pandas DataFrame, you'll need to use 强 > pd.read_csv强.

但这并不是故事的结尾; 数据以许多不同的格式存在，并以不同的方式存储，因此您通常需要向 read_csv传递额外的参数，以确保正确地读入数据。

下面的表格列出了使用 CSV 文件时遇到的常见场景以及需要使用的适当参数。您通常需要以下所有或一些参数的组合来读取你的数据。

┌───────────────────────────────────────────────────────┬───────────────────────┬────────────────────────────────────────────────────┐
│ pandas Implementation                                 │ Argument              │ Description                                        │
├───────────────────────────────────────────────────────┼───────────────────────┼────────────────────────────────────────────────────┤
│ pd.read_csv(..., sep=';')                             │ sep/delimiter         │ Read CSV with different separator¹                 │
│ pd.read_csv(..., delim_whitespace=True)               │ delim_whitespace      │ Read CSV with tab/whitespace separator             │
│ pd.read_csv(..., encoding='latin-1')                  │ encoding              │ Fix UnicodeDecodeError while reading²              │
│ pd.read_csv(..., header=False, names=['x', 'y', 'z']) │ header and names      │ Read CSV without headers³                          │
│ pd.read_csv(..., index_col=[0])                       │ index_col             │ Specify which column to set as the index⁴          │
│ pd.read_csv(..., usecols=['x', 'y'])                  │ usecols               │ Read subset of columns                             │
│ pd.read_csv(..., thousands='.', decimal=',')          │ thousands and decimal │ Numeric data is in European format (eg., 1.234,56) │
└───────────────────────────────────────────────────────┴───────────────────────┴────────────────────────────────────────────────────┘

脚注
默认情况下，read_csv使用 C 语法分析器引擎进行性能分析。C 语法分析器只能处理单个字符分隔符。如果你的 CSV 有多字符分隔符，则需要修改代码以使用也可以传递正则表达式:
 df = pd.read_csv(..., sep=r'\s*\|\s*', engine='python')
当数据以一种编码格式存储但以另一种不同的、不兼容的格式读取时，就会出现 UnicodeDecodeError 编码方案是 'utf-8'和 'latin-1'，你的数据很可能能装进这个里面。

header=False指定 CSV 中的第一行是数据行而不是头行，而且 names=[...]允许您时指定要分配给 DataFrame 的列名列表被创造出来。
当带有未命名索引的 DataFrame 被保存到 CSV，然后重新读取时，会发生“未命名: 0”。而不是修复 issue while reading, you can also fix the issue when writing by using
 df.to_csv(..., index=False)

There are other arguments I've not mentioned here, but these are the ones you'll encounter most frequently.

小开

import pandas as pd
df = pd.read_csv('/PathToFile.txt', sep = ',')

This will import your .txt or .csv file into a DataFrame.