如何迭代熊猫多索引数据框架使用索引

我有一个数据帧 df 看起来像这样。日期和时间是2个多级索引

                           observation1   observation2
date          Time
2012-11-02    9:15:00      79.373668      224
9:16:00      130.841316     477
2012-11-03    9:15:00      45.312814      835
9:16:00      123.776946     623
9:17:00      153.76646      624
9:18:00      463.276946     626
9:19:00      663.176934     622
9:20:00      763.77333      621
2012-11-04    9:15:00      115.449437     122
9:16:00      123.776946     555
9:17:00      153.76646      344
9:18:00      463.276946     212

我想运行一些日常数据块的复杂过程。

伪代码看起来像

 for count in df(level 0 index) :
new_df = get only chunk for count
complex_process(new_df)

因此,首先,我无法找到一种只访问日期块的方法

2012-11-03    9:15:00      45.312814      835
9:16:00      123.776946     623
9:17:00      153.76646      624
9:18:00      463.276946     626
9:19:00      663.176934     622
9:20:00      763.77333      621

然后送去处理。我在 for 循环中执行这个操作,因为我不确定是否有办法在不提及级别0列的确切值的情况下执行这个操作。我做了一些基本的搜索,找到了 df.index.get_level_values(0),但它返回所有的值,这导致循环在给定的一天内运行多次。我想每天创建一个数据框架,并将其发送处理。

81717 次浏览

One easy way would be to groupby the first level of the index - iterating over the groupby object will return the group keys and a subframe containing each group.

In [136]: for date, new_df in df.groupby(level=0):
...:     print(new_df)
...:
observation1  observation2
date       Time
2012-11-02 9:15:00     79.373668           224
9:16:00    130.841316           477


observation1  observation2
date       Time
2012-11-03 9:15:00     45.312814           835
9:16:00    123.776946           623
9:17:00    153.766460           624
9:18:00    463.276946           626
9:19:00    663.176934           622
9:20:00    763.773330           621


observation1  observation2
date       Time
2012-11-04 9:15:00    115.449437           122
9:16:00    123.776946           555
9:17:00    153.766460           344
9:18:00    463.276946           212

You can also use droplevel to remove the first index (the useless date index):

In [136]: for date, new_df in df.groupby(level=0):
...:     print(new_df.droplevel(0))
...:
observation1  observation2
Time
9:15:00     79.373668           224
9:16:00    130.841316           477
...

What about this?

for idate in df.index.get_level_values('date'):
complex_process(df.ix[idate], idate)

Tagging off of @psorenson answer, we can get unique level indices and its related data frame slices without numpy as follows:

for date in df.index.get_level_values('date').unique():
print(df.loc[date])

Late to the party, I found that the following works, too:

for date in df.index.unique("date"):
print(df.loc[date])

It uses the level optional parameter of the Index.unique method introduced in version 0.23.0.

You can specify either the level number or label.

Another alternative:

for date in df.index.levels[0]:
print(df.loc[date])

The difference with the df.index.unique("date") proposed by @sanzoghenzo is that it refers to the index level by its number rather than name.