如何循环分组熊猫数据框架?

小开

最佳答案

df.groupby('l_customer_id_i').agg(lambda x: ','.join(x))已经返回一个数据帧，所以你不能再循环遍历组。

一般来说:

df.groupby(...)返回一个GroupBy对象(一个DataFrameGroupBy或SeriesGroupBy)，有了它，你可以遍历组(如文档在这里中解释的那样)。你可以这样做:
```
grouped = df.groupby('A')


for name, group in grouped:
...
```
When you apply a function on the groupby, in your example df.groupby(...).agg(...) (but this can also be transform, apply, mean, ...), you combine the result of applying the function to the different groups together in one dataframe (the apply and combine step of the 'split-apply-combine' paradigm of groupby). So the result of this will always be again a DataFrame (or a Series depending on the applied function).

小开

如果已经创建了数据帧，则可以遍历索引值。

df = df.groupby('l_customer_id_i').agg(lambda x: ','.join(x))
for name in df.index:
print name
print df.loc[name]

小开

下面是一个迭代pd.DataFrame的例子，该pd.DataFrame按列atable分组。对于这个示例，"create"SQL数据库的语句在for循环中生成:

import pandas as pd


df1 = pd.DataFrame({
'atable':     ['Users', 'Users', 'Domains', 'Domains', 'Locks'],
'column':     ['col_1', 'col_2', 'col_a', 'col_b', 'col'],
'column_type':['varchar', 'varchar', 'int', 'varchar', 'varchar'],
'is_null':    ['No', 'No', 'Yes', 'No', 'Yes'],
})


df1_grouped = df1.groupby('atable')


# iterate over each group
for group_name, df_group in df1_grouped:
print('\nCREATE TABLE {}('.format(group_name))


for row_index, row in df_group.iterrows():
col = row['column']
column_type = row['column_type']
is_null = 'NOT NULL' if row['is_null'] == 'No' else ''
print('\t{} {} {},'.format(col, column_type, is_null))


print(");")