只选择多索引 DataFrame 的一个索引

我试图创建一个新的数据框架使用只有一个索引从一个多索引数据框架。

                   A         B         C
first second
bar   one     0.895717  0.410835 -1.413681
two     0.805244  0.813850  1.607920
baz   one    -1.206412  0.132003  1.024180
two     2.565646 -0.827317  0.569605
foo   one     1.431256 -0.076467  0.875906
two     1.340309 -1.187678 -2.211372
qux   one    -1.170299  1.130127  0.974466
two    -0.226169 -1.436737 -2.006747

理想情况下,我会喜欢这样的东西:

In: df.ix[level="first"]

以及:

Out:


A         B         C
first
bar        0.895717  0.410835 -1.413681
0.805244  0.813850  1.607920
baz       -1.206412  0.132003  1.024180
2.565646 -0.827317  0.569605
foo        1.431256 -0.076467  0.875906
1.340309 -1.187678 -2.211372
qux       -1.170299  1.130127  0.974466
-0.226169 -1.436737 -2.006747
`

实际上,我想删除除级别 first以外的所有其他多索引索引。有什么简单的方法吗?

102693 次浏览

One way could be to simply rebind df.index to the desired level of the MultiIndex. You can do this by specifying the label name you want to keep:

df.index = df.index.get_level_values('first')

or use the level's integer value:

df.index = df.index.get_level_values(0)

All other levels of the MultiIndex would disappear here.

The solution is fairly new and uses the df.xs function as

In [88]: df.xs('bar', level='first')
Out[88]:
Second  Third
one     A       -2.315312
B        0.497769
C        0.108523
two     A       -0.778303
B       -1.555389
C       -2.625022
dtype: float64

Can also do with multiple indices as

In [89]: df.xs(('bar', 'A'), level=('First', 'Third'))
Out[89]:
Second
one   -2.315312
two   -0.778303
dtype: float64

The setup for the examples is below

import pandas as pd
import numpy as np
arrays = [
np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])
]
index = pd.MultiIndex.from_tuples(list(zip(*arrays)), names=['first', 'second'])
df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index)
df.index.names = pd.core.indexes.frozen.FrozenList(['First', 'Second', 'Third'])
df = df.unstack()

I used the get_level_values(0) to get the first level index in a multi index group by to build a dataframe containing the aggregate value and the description dictionary value of the encoded value. I get the index for "airline_enc" values in the group by

def getAirlineByGrouped(grouped,dictGeneric):
mylist=[]
for key in grouped.index.get_level_values(0):
item=dictGeneric.get(key)
mylist.append(item)
return mylist


encoder=LabelEncoder()
df['airline_enc']=encoder.fit_transform(df['airline'])


dictAirline=   df[['airline_enc','airline']].set_index('airline_enc').to_dict()
grouped=results.groupby(['airline_enc','rating'])['recommended'].count()


#print(grouped)
airlines=getAirlineByGrouped(grouped, dictAirline['airline'])


result_df=pd.DataFrame({'index': grouped.index.get_level_values(0),'value':grouped.values,'airline':airlines})
result_df.plot(x='airline',y='value')
plt.xticks(rotation=90)