Calculate difference between values in consecutive rows by group

This is a my df (data.frame):

group value
1     10
1     20
1     25
2     5
2     10
2     15

I need to calculate difference between values in consecutive rows by group.

So, I need a that result.

group value diff
1     10    NA # because there is a no previous value
1     20    10 # value[2] - value[1]
1     25    5  # value[3] value[2]
2     5     NA # because group is changed
2     10    5  # value[5] - value[4]
2     15    5  # value[6] - value[5]

Although, I can handle this problem by using ddply, but it takes too much time. This is because I have a lot of groups in my df. (over 1,000,000 groups in my df)

Are there any other effective approaches to handle this problem?

80153 次浏览

The package data.table can do this fairly quickly, using the shift function.

require(data.table)
df <- data.table(group = rep(c(1, 2), each = 3), value = c(10,20,25,5,10,15))
#setDT(df) #if df is already a data frame


df[ , diff := value - shift(value), by = group]
#   group value diff
#1:     1    10   NA
#2:     1    20   10
#3:     1    25    5
#4:     2     5   NA
#5:     2    10    5
#6:     2    15    5
setDF(df) #if you want to convert back to old data.frame syntax

Or using the lag function in dplyr

df %>%
group_by(group) %>%
mutate(Diff = value - lag(value))
#   group value  Diff
#   <int> <int> <int>
# 1     1    10    NA
# 2     1    20    10
# 3     1    25     5
# 4     2     5    NA
# 5     2    10     5
# 6     2    15     5

For alternatives pre-data.table::shift and pre-dplyr::lag, see edits.

try this with tapply

df$diff<-as.vector(unlist(tapply(df$value,df$group,FUN=function(x){ return (c(NA,diff(x)))})))

You can use the base function ave() for this

df <- data.frame(group=rep(c(1,2),each=3),value=c(10,20,25,5,10,15))
df$diff <- ave(df$value, factor(df$group), FUN=function(x) c(NA,diff(x)))

which returns

  group value diff
1     1    10   NA
2     1    20   10
3     1    25    5
4     2     5   NA
5     2    10    5
6     2    15    5