在 Python/Pandas 中计算两行之间的差异

在 python 中,如何引用前一行并针对它进行计算?具体来说,我使用的是 pandas中的 dataframes-我有一个数据框,里面充满了股票价格信息,看起来像这样:

           Date   Close  Adj Close
251  2011-01-03  147.48     143.25
250  2011-01-04  147.64     143.41
249  2011-01-05  147.05     142.83
248  2011-01-06  148.66     144.40
247  2011-01-07  147.93     143.69

下面是我如何创建这个数据框架:

import pandas


url = 'http://ichart.finance.yahoo.com/table.csv?s=IBM&a=00&b=1&c=2011&d=11&e=31&f=2011&g=d&ignore=.csv'
data = data = pandas.read_csv(url)


## now I sorted the data frame ascending by date
data = data.sort(columns='Date')

从第2行开始,或者在本例中,我猜是250(PS-这是索引吗?),我想计算2011-01-03和2011-01-04之间的差额,对于这个数据框中的每个条目。我认为正确的方法是写一个函数,取当前行,然后计算出前一行,并计算它们之间的差异,使用 pandas apply函数更新数据帧的值。

这样做对吗?如果是这样,我应该使用索引来确定差异吗?(注意-我仍然处于 Python 初学者模式,所以 index 可能不是正确的术语,甚至也不是实现它的正确方法)

142647 次浏览

I don't know pandas, and I'm pretty sure it has something specific for this; however, I'll give you the pure-Python solution, that might be of some help even if you need to use pandas:

import csv
import urllib


# This basically retrieves the CSV files and loads it in a list, converting
# All numeric values to floats
url='http://ichart.finance.yahoo.com/table.csv?s=IBM&a=00&b=1&c=2011&d=11&e=31&f=2011&g=d&ignore=.csv'
reader = csv.reader(urllib.urlopen(url), delimiter=',')
# We sort the output list so the records are ordered by date
cleaned = sorted([[r[0]] + map(float, r[1:]) for r in list(reader)[1:]])


for i, row in enumerate(cleaned):  # enumerate() yields two-tuples: (<id>, <item>)
# The try..except here is to skip the IndexError for line 0
try:
# This will calculate difference of each numeric field with the same field
# in the row before this one
print row[0], [(row[j] - cleaned[i-1][j]) for j in range(1, 7)]
except IndexError:
pass

I think you want to do something like this:

In [26]: data
Out[26]:
Date   Close  Adj Close
251  2011-01-03  147.48     143.25
250  2011-01-04  147.64     143.41
249  2011-01-05  147.05     142.83
248  2011-01-06  148.66     144.40
247  2011-01-07  147.93     143.69


In [27]: data.set_index('Date').diff()
Out[27]:
Close  Adj Close
Date
2011-01-03    NaN        NaN
2011-01-04   0.16       0.16
2011-01-05  -0.59      -0.58
2011-01-06   1.61       1.57
2011-01-07  -0.73      -0.71

To calculate difference of one column. Here is what you can do.

df=
A      B
0     10     56
1     45     48
2     26     48
3     32     65

We want to compute row difference in A only and want to consider the rows which are less than 15.

df['A_dif'] = df['A'].diff()
df=
A      B      A_dif
0     10     56      Nan
1     45     48      35
2     26     48      19
3     32     65      6
df = df[df['A_dif']<15]


df=
A      B      A_dif
0     10     56      Nan
3     32     65      6