熊猫解栈问题: ValueError: Index 包含重复的条目,无法重塑形状

我正试图解开一个有熊猫的多重索引,我不断得到:

ValueError: Index contains duplicate entries, cannot reshape

给定一个有四列的数据集:

  • Id (字符串)
  • 日期(字符串)
  • 位置(字符串)
  • 值(浮点数)

我首先设置了一个三级多索引:

In [37]: e.set_index(['id', 'date', 'location'], inplace=True)


In [38]: e
Out[38]:
value
id           date       location
id1          2014-12-12 loc1        16.86
2014-12-11 loc1        17.18
2014-12-10 loc1        17.03
2014-12-09 loc1        17.28

然后我试着解开这个位置:

In [39]: e.unstack('location')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-39-bc1e237a0ed7> in <module>()
----> 1 e.unstack('location')
...
C:\Anaconda\envs\sandbox\lib\site-packages\pandas\core\reshape.pyc in _make_selectors(self)
143
144         if mask.sum() < len(self.index):
--> 145             raise ValueError('Index contains duplicate entries, '
146                              'cannot reshape')
147


ValueError: Index contains duplicate entries, cannot reshape

这是怎么回事?

194611 次浏览

Here's an example DataFrame which show this, it has duplicate values with the same index. The question is, do you want to aggregate these or keep them as multiple rows?

In [11]: df
Out[11]:
0  1  2      3
0  1  2  a  16.86
1  1  2  a  17.18
2  1  4  a  17.03
3  2  5  b  17.28


In [12]: df.pivot_table(values=3, index=[0, 1], columns=2, aggfunc='mean')  # desired?
Out[12]:
2        a      b
0 1
1 2  17.02    NaN
4  17.03    NaN
2 5    NaN  17.28


In [13]: df1 = df.set_index([0, 1, 2])


In [14]: df1
Out[14]:
3
0 1 2
1 2 a  16.86
a  17.18
4 a  17.03
2 5 b  17.28


In [15]: df1.unstack(2)
ValueError: Index contains duplicate entries, cannot reshape

One solution is to reset_index (and get back to df) and use pivot_table.

In [16]: df1.reset_index().pivot_table(values=3, index=[0, 1], columns=2, aggfunc='mean')
Out[16]:
2        a      b
0 1
1 2  17.02    NaN
4  17.03    NaN
2 5    NaN  17.28

Another option (if you don't want to aggregate) is to append a dummy level, unstack it, then drop the dummy level...

There's a far more simpler solution to tackle this.

The reason why you get ValueError: Index contains duplicate entries, cannot reshape is because, once you unstack "Location", then the remaining index columns "id" and "date" combinations are no longer unique.

You can avoid this by retaining the default index column (row #) and while setting the index using "id", "date" and "location", add it in "append" mode instead of the default overwrite mode.

So use,

e.set_index(['id', 'date', 'location'], append=True)

Once this is done, your index columns will still have the default index along with the set indexes. And unstack will work.

Let me know how it works out.

I had such problem. In my case problem was in data - my column 'information' contained 1 unique value and it caused error

UPDATE: to correct work 'pivot' pairs (id_user,information) cannot have duplicates

It works:

df2 = pd.DataFrame({'id_user':[1,2,3,4,4,5,5],
'information':['phon','phon','phone','phone1','phone','phone1','phone'],
'value': [1, '01.01.00', '01.02.00', 2, '01.03.00', 3, '01.04.00']})
df2.pivot(index='id_user', columns='information', values='value')

it doesn't work:

df2 = pd.DataFrame({'id_user':[1,2,3,4,4,5,5],
'information':['phone','phone','phone','phone','phone','phone','phone'],
'value': [1, '01.01.00', '01.02.00', 2, '01.03.00', 3, '01.04.00']})
df2.pivot(index='id_user', columns='information', values='value')

source: https://stackoverflow.com/a/37021196/6088984