如何在熊猫中按数据框分组并保留列

给定一个数据框架,记录一些图书的使用情况,如下所示:

Name   Type   ID
Book1  ebook  1
Book2  paper  2
Book3  paper  3
Book1  ebook  1
Book2  paper  2

I need to get the count of all the books, keeping the other columns and get this:

Name   Type   ID    Count
Book1  ebook  1     2
Book2  paper  2     2
Book3  paper  3     1

这怎么可能呢?

谢谢!

167601 次浏览

你想要以下的东西:

In [20]:
df.groupby(['Name','Type','ID']).count().reset_index()


Out[20]:
Name   Type  ID  Count
0  Book1  ebook   1      2
1  Book2  paper   2      2
2  Book3  paper   3      1

在您的情况下,’名称’,’类型’和’ID’在值上匹配,所以我们可以 groupby对这些,调用 count,然后 reset_index

另一种方法是使用 transform添加“ Count”列,然后调用 drop_duplicates:

In [25]:
df['Count'] = df.groupby(['Name'])['ID'].transform('count')
df.drop_duplicates()


Out[25]:
Name   Type  ID  Count
0  Book1  ebook   1      2
1  Book2  paper   2      2
2  Book3  paper   3      1

我觉得 as_index=False应该可以。

df.groupby(['Name','Type','ID'], as_index=False).count()

If you have many columns in a df it makes sense to use df.groupby(['foo']).agg(...), see 给你. The .agg() function allows you to choose what to do with the columns you don't want to apply operations on. If you just want to keep them, use .agg({'col1': 'first', 'col2': 'first', ...}. Instead of 'first', you can also apply 'sum', 'mean' and others.