如何从 for 循环建立和填充熊猫数据框？

小开

最佳答案

Try this using list comprehension:

import pandas as pd


df = pd.DataFrame(
[p, p.team, p.passing_att, p.passer_rating()] for p in game.players.passing()
)

小开

The simplest answer is what Paul H said:

d = []
for p in game.players.passing():
d.append(
{
'Player': p,
'Team': p.team,
'Passer Rating':  p.passer_rating()
}
)


pd.DataFrame(d)

But if you really want to "build and fill a dataframe from a loop", (which, btw, I wouldn't recommend), here's how you'd do it.

d = pd.DataFrame()


for p in game.players.passing():
temp = pd.DataFrame(
{
'Player': p,
'Team': p.team,
'Passer Rating': p.passer_rating()
}
)


d = pd.concat([d, temp])

小开

Make a list of tuples with your data and then create a DataFrame with it:

d = []
for p in game.players.passing():
d.append((p, p.team, p.passer_rating()))


pd.DataFrame(d, columns=('Player', 'Team', 'Passer Rating'))

A list of tuples should have less overhead than a list dictionaries. I tested this below, but please remember to prioritize ease of code understanding over performance in most cases.

Testing functions:

def with_tuples(loop_size=1e5):
res = []


for x in range(int(loop_size)):
res.append((x-1, x, x+1))


return pd.DataFrame(res, columns=("a", "b", "c"))


def with_dict(loop_size=1e5):
res = []


for x in range(int(loop_size)):
res.append({"a":x-1, "b":x, "c":x+1})


return pd.DataFrame(res)

Results:

%timeit -n 10 with_tuples()
# 10 loops, best of 3: 55.2 ms per loop


%timeit -n 10 with_dict()
# 10 loops, best of 3: 130 ms per loop

小开

I may be wrong, but I think the accepted answer by @amit has a bug.

from pandas import DataFrame as df
x = [1,2,3]
y = [7,8,9,10]


# this gives me a syntax error at 'for' (Python 3.7)
d1 = df[[a, "A", b, "B"] for a in x for b in y]


# this works
d2 = df([a, "A", b, "B"] for a in x for b in y)


# and if you want to add the column names on the fly
# note the additional parentheses
d3 = df(([a, "A", b, "B"] for a in x for b in y), columns = ("l","m","n","o"))