更新大熊猫在迭代中的值

我正在做一些地理编码工作,我使用 selenium屏幕刮取 x-y 坐标,我需要一个位置的地址,我导入了一个 xls 文件熊猫数据帧,并希望使用显式循环更新行没有 x-y 坐标,如下:

for index, row in rche_df.iterrows():
if isinstance(row.wgs1984_latitude, float):
row = row.copy()
target = row.address_chi
dict_temp = geocoding(target)
row.wgs1984_latitude = dict_temp['lat']
row.wgs1984_longitude = dict_temp['long']

我已经阅读了 为什么这个函数在我迭代一个熊猫数据框架之后不“接受”?,并且完全意识到 iterrow 只给我们提供了一个视图而不是一个用于编辑的副本,但是如果我真的要逐行更新值呢?lambda可行吗?

135574 次浏览

The rows you get back from iterrows are copies that are no longer connected to the original data frame, so edits don't change your dataframe. Thankfully, because each item you get back from iterrows contains the current index, you can use that to access and edit the relevant row of the dataframe:

for index, row in rche_df.iterrows():
if isinstance(row.wgs1984_latitude, float):
row = row.copy()
target = row.address_chi
dict_temp = geocoding(target)
rche_df.loc[index, 'wgs1984_latitude'] = dict_temp['lat']
rche_df.loc[index, 'wgs1984_longitude'] = dict_temp['long']

In my experience, this approach seems slower than using an approach like apply or map, but as always, it's up to you to decide how to make the performance/ease of coding tradeoff.

Another way based on this question:

for index, row in rche_df.iterrows():
if isinstance(row.wgs1984_latitude, float):
row = row.copy()
target = row.address_chi
dict_temp = geocoding(target)
        

rche_df.at[index, 'wgs1984_latitude'] = dict_temp['lat']
rche_df.at[index, 'wgs1984_longitude'] = dict_temp['long']

This link describe difference between .loc and .at. Shortly, .at faster than .loc.