从词典中创建熊猫数据框架

我有一本这种形式的字典:

{'user':{movie:rating} }

比如说,

{Jill': {'Avenger: Age of Ultron': 7.0,
'Django Unchained': 6.5,
'Gone Girl': 9.0,
'Kill the Messenger': 8.0}
'Toby': {'Avenger: Age of Ultron': 8.5,
'Django Unchained': 9.0,
'Zoolander': 2.0}}

我想把这个字典转换成一个熊猫数据框架,第一列是用户名,其他列是电影评级。

user  Gone_Girl  Horrible_Bosses_2  Django_Unchained  Zoolander etc. \

但是,一些用户没有对电影进行评分,因此这些电影不包括在该用户键()的值()中。在这些情况下,最好只用 NaN 填充条目。

现在,我迭代键,填充列表,然后使用这个列表创建一个数据框架:

data=[]
for i,key in enumerate(movie_user_preferences.keys() ):
try:
data.append((key
,movie_user_preferences[key]['Gone Girl']
,movie_user_preferences[key]['Horrible Bosses 2']
,movie_user_preferences[key]['Django Unchained']
,movie_user_preferences[key]['Zoolander']
,movie_user_preferences[key]['Avenger: Age of Ultron']
,movie_user_preferences[key]['Kill the Messenger']))
# if no entry, skip
except:
pass
df=pd.DataFrame(data=data,columns=['user','Gone_Girl','Horrible_Bosses_2','Django_Unchained','Zoolander','Avenger_Age_of_Ultron','Kill_the_Messenger'])

但是这只给了我一个用户的数据框,这些用户对片场中的所有电影都进行了评分。

我的目标是通过迭代电影标签(而不是上面所示的蛮力方法)来附加到数据列表中,其次,创建一个包含所有用户的数据框架,并在没有电影评级的元素中放置空值。

72574 次浏览

您可以将 dict 的 dict 传递给 DataFrame 构造函数:

In [11]: d = {'Jill': {'Django Unchained': 6.5, 'Gone Girl': 9.0, 'Kill the Messenger': 8.0, 'Avenger: Age of Ultron': 7.0}, 'Toby': {'Django Unchained': 9.0, 'Zoolander': 2.0, 'Avenger: Age of Ultron': 8.5}}


In [12]: pd.DataFrame(d)
Out[12]:
Jill  Toby
Avenger: Age of Ultron   7.0   8.5
Django Unchained         6.5   9.0
Gone Girl                9.0   NaN
Kill the Messenger       8.0   NaN
Zoolander                NaN   2.0

或者使用 from_dict方法:

In [13]: pd.DataFrame.from_dict(d)
Out[13]:
Jill  Toby
Avenger: Age of Ultron   7.0   8.5
Django Unchained         6.5   9.0
Gone Girl                9.0   NaN
Kill the Messenger       8.0   NaN
Zoolander                NaN   2.0


In [14]: pd.DataFrame.from_dict(d, orient='index')
Out[14]:
Django Unchained  Gone Girl  Kill the Messenger  Avenger: Age of Ultron  Zoolander
Jill               6.5          9                   8                     7.0        NaN
Toby               9.0        NaN                 NaN                     8.5          2

这种暴力方法似乎也有效,但在我看来,在电影标签上迭代仍然更加健壮。

data=[]
for i,key in enumerate(movie_user_preferences.keys() ):
try:
data.append((key
,movie_user_preferences[key]['Gone Girl'] if 'Gone Girl' in movie_user_preferences[key] else 'NaN'
,movie_user_preferences[key]['Horrible Bosses 2'] if 'Horrible Bosses 2' in movie_user_preferences[key] else 'NaN'
,movie_user_preferences[key]['Django Unchained'] if 'Django Unchained' in movie_user_preferences[key] else 'NaN'
,movie_user_preferences[key]['Zoolander'] if 'Zoolander' in movie_user_preferences[key] else 'NaN'
,movie_user_preferences[key]['Avenger: Age of Ultron'] if 'Avenger: Age of Ultron' in movie_user_preferences[key] else 'NaN'
,movie_user_preferences[key]['Kill the Messenger'] if 'Kill the Messenger' in movie_user_preferences[key] else 'NaN' ))


# if no entry, skip
except:
pass




user Gone_Girl Horrible_Bosses_2  Django_Unchained Zoolander  \
0      Sam         6                 3               7.5         7
1      Max        10                 6               7.0        10
2   Robert       NaN                 5               7.0         9
3     Toby       NaN               NaN               9.0         2
4    Julia       6.5               NaN               6.0       6.5
5  William         7                 4               8.0         4
6     Jill         9               NaN               6.5       NaN


Avenger_Age_of_Ultron Kill_the_Messenger
0                   10.0                5.5
1                    7.0                  5
2                    8.0                  9
3                    8.5                NaN
4                   10.0                  6
5                    6.0                6.5
6                    7.0                  8