如何从 for 循环建立和填充熊猫数据框?

下面是我正在运行的代码的一个简单示例,我希望将结果放入熊猫数据框架(除非有更好的选择) :

for p in game.players.passing():
print p, p.team, p.passing_att, p.passer_rating()


R.Wilson SEA 29 55.7
J.Ryan SEA 1 158.3
A.Rodgers GB 34 55.8

使用以下代码:

d = []
for p in game.players.passing():
d = [{'Player': p, 'Team': p.team, 'Passer Rating':
p.passer_rating()}]


pd.DataFrame(d)

我可以得到:

    Passer Rating   Player      Team
0 55.8            A.Rodgers   GB

这是一个1x3的数据框架,我知道 为什么它只有一行,但我不知道如何使它多行列在正确的顺序。理想情况下,解决方案将能够处理 N的行数(基于 p) ,如果列数由请求的统计数据设置,那将是非常好的(尽管不是必需的)。有什么建议吗?先谢谢你!

264941 次浏览

Try this using list comprehension:

import pandas as pd


df = pd.DataFrame(
[p, p.team, p.passing_att, p.passer_rating()] for p in game.players.passing()
)

The simplest answer is what Paul H said:

d = []
for p in game.players.passing():
d.append(
{
'Player': p,
'Team': p.team,
'Passer Rating':  p.passer_rating()
}
)


pd.DataFrame(d)

But if you really want to "build and fill a dataframe from a loop", (which, btw, I wouldn't recommend), here's how you'd do it.

d = pd.DataFrame()


for p in game.players.passing():
temp = pd.DataFrame(
{
'Player': p,
'Team': p.team,
'Passer Rating': p.passer_rating()
}
)


d = pd.concat([d, temp])

Make a list of tuples with your data and then create a DataFrame with it:

d = []
for p in game.players.passing():
d.append((p, p.team, p.passer_rating()))


pd.DataFrame(d, columns=('Player', 'Team', 'Passer Rating'))

A list of tuples should have less overhead than a list dictionaries. I tested this below, but please remember to prioritize ease of code understanding over performance in most cases.

Testing functions:

def with_tuples(loop_size=1e5):
res = []


for x in range(int(loop_size)):
res.append((x-1, x, x+1))


return pd.DataFrame(res, columns=("a", "b", "c"))


def with_dict(loop_size=1e5):
res = []


for x in range(int(loop_size)):
res.append({"a":x-1, "b":x, "c":x+1})


return pd.DataFrame(res)

Results:

%timeit -n 10 with_tuples()
# 10 loops, best of 3: 55.2 ms per loop


%timeit -n 10 with_dict()
# 10 loops, best of 3: 130 ms per loop

I may be wrong, but I think the accepted answer by @amit has a bug.

from pandas import DataFrame as df
x = [1,2,3]
y = [7,8,9,10]


# this gives me a syntax error at 'for' (Python 3.7)
d1 = df[[a, "A", b, "B"] for a in x for b in y]


# this works
d2 = df([a, "A", b, "B"] for a in x for b in y)


# and if you want to add the column names on the fly
# note the additional parentheses
d3 = df(([a, "A", b, "B"] for a in x for b in y), columns = ("l","m","n","o"))