从(row、 coll、 value)的元组列表构造 PandasDataFrame

我有一个类似元组的列表

data = [
('r1', 'c1', avg11, stdev11),
('r1', 'c2', avg12, stdev12),
('r2', 'c1', avg21, stdev21),
('r2', 'c2', avg22, stdev22)
]

我想把它们放到一个熊猫数据框架中,行由第一列命名,列由第二列命名。看起来处理行名的方法类似于 pandas.DataFrame([x[1:] for x in data], index = [x[0] for x in data]),但是如何处理列以得到一个2x2的矩阵(前一个集合的输出是3x4) ?有没有一种更聪明的方法来处理行标签,而不是显式地省略它们?

看起来我需要2个数据帧——一个用于平均值,一个用于标准偏差,对吗?或者我可以在每个“单元格”中存储一个值列表吗?

159925 次浏览

You can pivot your DataFrame after creating:

>>> df = pd.DataFrame(data)
>>> df.pivot(index=0, columns=1, values=2)
# avg DataFrame
1      c1     c2
0
r1  avg11  avg12
r2  avg21  avg22
>>> df.pivot(index=0, columns=1, values=3)
# stdev DataFrame
1        c1       c2
0
r1  stdev11  stdev12
r2  stdev21  stdev22

I submit that it is better to leave your data stacked as it is:

df = pandas.DataFrame(data, columns=['R_Number', 'C_Number', 'Avg', 'Std'])


# Possibly also this if these can always be the indexes:
# df = df.set_index(['R_Number', 'C_Number'])

Then it's a bit more intuitive to say

df.set_index(['R_Number', 'C_Number']).Avg.unstack(level=1)

This way it is implicit that you're seeking to reshape the averages, or the standard deviations. Whereas, just using pivot, it's purely based on column convention as to what semantic entity it is that you are reshaping.

This is what I expected to see when I came to this question:

#!/usr/bin/env python


import pandas as pd




df = pd.DataFrame([(1, 2, 3, 4),
(5, 6, 7, 8),
(9, 0, 1, 2),
(3, 4, 5, 6)],
columns=list('abcd'),
index=['India', 'France', 'England', 'Germany'])
print(df)

gives

         a  b  c  d
India    1  2  3  4
France   5  6  7  8
England  9  0  1  2
Germany  3  4  5  6