Compute row average in pandas

       Y1961      Y1962      Y1963      Y1964      Y1965  Region
0  82.567307  83.104757  83.183700  83.030338  82.831958  US
1   2.699372   2.610110   2.587919   2.696451   2.846247  US
2  14.131355  13.690028  13.599516  13.649176  13.649046  US
3   0.048589   0.046982   0.046583   0.046225   0.051750  US
4   0.553377   0.548123   0.582282   0.577811   0.620999  US

In the above dataframe, I would like to get average of each row. currently, I am doing this:

df.mean(axis=0)

However, this does away with the Region column as well. how can I compute mean and also retain Region column

223068 次浏览

小开

最佳答案

You can specify a new column. You also need to compute the mean along the rows, so use axis=1.

df['mean'] = df.mean(axis=1)
>>> df
Y1961      Y1962      Y1963      Y1964      Y1965 Region       mean
0  82.567307  83.104757  83.183700  83.030338  82.831958     US  82.943612
1   2.699372   2.610110   2.587919   2.696451   2.846247     US   2.688020
2  14.131355  13.690028  13.599516  13.649176  13.649046     US  13.743824
3   0.048589   0.046982   0.046583   0.046225   0.051750     US   0.048026
4   0.553377   0.548123   0.582282   0.577811   0.620999     US   0.576518

小开

If you are looking to average column wise. Try this,

df.drop('Region', axis=1).apply(lambda x: x.mean())


# it drops the Region column
df.drop('Region', axis=1,inplace=True)

小开

I think this is what you are looking for:

df.drop('Region', axis=1).apply(lambda x: x.mean(), axis=1)

小开

We can find the the mean of a row using the range function, i.e in your case, from the Y1961 column to the Y1965

df['mean'] = df.iloc[:, 0:4].mean(axis=1)

And if you want to select individual columns

df['mean'] = df.iloc[:, [0,1,2,3,4].mean(axis=1)

小开

Taking the mean based on the column names

I am just sharing this which might be useful for those folks who want to take average of a few columns based on the their names, instead of counting the column index. This simply would be done using pandas's loc instead of iloc. For instance, taking the odd-year average would be:

df["mean_odd_year"] = df.loc[:, ["Y1961","Y1963","Y1965"]].mean(axis = 1)