熊猫群组和只加一栏

所以我有一个数据框架,df1,看起来像下面这样:

       A      B      C
1     foo    12    California
2     foo    22    California
3     bar    8     Rhode Island
4     bar    32    Rhode Island
5     baz    15    Ohio
6     baz    26    Ohio

我希望按 A列分组,然后对 B列求和,同时将值保留在 C列中。就像这样:

      A       B      C
1    foo     34    California
2    bar     40    Rhode Island
3    baz     41    Ohio

问题是,当我说

df.groupby('A').sum()

C被删除,返回

      B
A
bar  40
baz  41
foo  34

如何避免这种情况,并在分组和求和时保持列 C

154774 次浏览

The only way to do this would be to include C in your groupby (the groupby function can accept a list).

Give this a try:

df.groupby(['A','C'])['B'].sum()

One other thing to note, if you need to work with df after the aggregation you can also use the as_index=False option to return a dataframe object. This one gave me problems when I was first working with Pandas. Example:

df.groupby(['A','C'], as_index=False)['B'].sum()

If you don't care what's in your column C and just want the nth value, you could just do this:

df.groupby('A').agg({'B' : 'sum',
'C' : lambda x: x.iloc[n]})

Another option is to use groupby.agg and use the first method on column "C".

out = df.groupby('A', as_index=False, sort=False).agg({'B':'sum', 'C':'first'})

Output:

     A   B             C
0  foo  34    California
1  bar  40  Rhode Island
2  baz  41          Ohio