How to aggregate unique count with pandas pivot_table

This code:

df2 = (
pd.DataFrame({
'X' : ['X1', 'X1', 'X1', 'X1'],
'Y' : ['Y2', 'Y1', 'Y1', 'Y1'],
'Z' : ['Z3', 'Z1', 'Z1', 'Z2']
})
)
g = df2.groupby('X')
pd.pivot_table(g, values='X', rows='Y', cols='Z', margins=False, aggfunc='count')

returns the following error:

Traceback (most recent call last): ...
AttributeError: 'Index' object has no attribute 'index'

How do I get a Pivot Table with counts of unique values of one DataFrame column for two other columns?
Is there aggfunc for count unique? Should I be using np.bincount()?

NB. I am aware of pandas.Series.values_counts() however I need a pivot table.


EDIT: The output should be:

Z   Z1  Z2  Z3
Y
Y1   1   1 NaN
Y2 NaN NaN   1
284768 次浏览

Do you mean something like this?

>>> df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=lambda x: len(x.unique()))


Z   Z1  Z2  Z3
Y
Y1   1   1 NaN
Y2 NaN NaN   1

Note that using len assumes you don't have NAs in your DataFrame. You can do x.value_counts().count() or len(x.dropna().unique()) otherwise.

You can construct a pivot table for each distinct value of X. In this case,

for xval, xgroup in g:
ptable = pd.pivot_table(xgroup, rows='Y', cols='Z',
margins=False, aggfunc=numpy.size)

will construct a pivot table for each value of X. You may want to index ptable using the xvalue. With this code, I get (for X1)

     X
Z   Z1  Z2  Z3
Y
Y1   2   1 NaN
Y2 NaN NaN   1

This is a good way of counting entries within .pivot_table:

>>> df2.pivot_table(values='X', index=['Y','Z'], columns='X', aggfunc='count')


X1  X2
Y   Z
Y1  Z1   1   1
Z2   1  NaN
Y2  Z3   1  NaN

aggfunc=pd.Series.nunique provides distinct count. Full code is following:

df2.pivot_table(values='X', rows='Y', cols='Z', aggfunc=pd.Series.nunique)

Credit to @hume for this solution (see comment under the accepted answer). Adding as an answer here for better discoverability.

Since at least version 0.16 of pandas, it does not take the parameter "rows"

As of 0.23, the solution would be:

df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=pd.Series.nunique)

which returns:

Z    Z1   Z2   Z3
Y
Y1  1.0  1.0  NaN
Y2  NaN  NaN  1.0

Since none of the answers are up to date with the last version of Pandas, I am writing another solution for this problem:

import pandas as pd


# Set example
df2 = (
pd.DataFrame({
'X' : ['X1', 'X1', 'X1', 'X1'],
'Y' : ['Y2', 'Y1', 'Y1', 'Y1'],
'Z' : ['Z3', 'Z1', 'Z1', 'Z2']
})
)


# Pivot
pd.crosstab(index=df2['Y'], columns=df2['Z'], values=df2['X'], aggfunc=pd.Series.nunique)

which returns:

Z   Z1  Z2  Z3
Y
Y1  1.0 1.0 NaN
Y2  NaN NaN 1.0

For best performance I recommend doing DataFrame.drop_duplicates followed up aggfunc='count'.

Others are correct that aggfunc=pd.Series.nunique will work. This can be slow, however, if the number of index groups you have is large (>1000).

So instead of (to quote @Javier)

df2.pivot_table('X', 'Y', 'Z', aggfunc=pd.Series.nunique)

I suggest

df2.drop_duplicates(['X', 'Y', 'Z']).pivot_table('X', 'Y', 'Z', aggfunc='count')

This works because it guarantees that every subgroup (each combination of ('Y', 'Z')) will have unique (non-duplicate) values of 'X'.

aggfunc=pd.Series.nunique will only count unique values for a series - in this case count the unique values for a column. But this doesn't quite reflect as an alternative to aggfunc='count'

For simple counting, it better to use aggfunc=pd.Series.count

out = df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=['nunique', 'count', lambda x: len(x.unique()), len])


[out]:
nunique           count           <lambda>            len
Z       Z1   Z2   Z3    Z1   Z2   Z3       Z1   Z2   Z3   Z1   Z2   Z3
Y
Y1     1.0  1.0  NaN   2.0  1.0  NaN      1.0  1.0  NaN  2.0  1.0  NaN
Y2     NaN  NaN  1.0   NaN  NaN  1.0      NaN  NaN  1.0  NaN  NaN  1.0




out = df2.pivot_table(values='X', index='Y', columns='Z', aggfunc='nunique')


[out]:
Z    Z1   Z2   Z3
Y
Y1  1.0  1.0  NaN
Y2  NaN  NaN  1.0


out = df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=['nunique'])


[out]:
nunique
Z       Z1   Z2   Z3
Y
Y1     1.0  1.0  NaN
Y2     NaN  NaN  1.0