I have a data.table:
require(data.table)
set.seed(1)
data <- data.table(time = c(1:3, 1:4),
groups = c(rep(c("b", "a"), c(3, 4))),
value = rnorm(7))
data
# groups time value
# 1: b 1 -0.6264538
# 2: b 2 0.1836433
# 3: b 3 -0.8356286
# 4: a 1 1.5952808
# 5: a 2 0.3295078
# 6: a 3 -0.8204684
# 7: a 4 0.4874291
I want to compute a lagged version of the "value" column, within each level of "groups".
The result should look like
# groups time value lag.value
# 1 a 1 1.5952808 NA
# 2 a 2 0.3295078 1.5952808
# 3 a 3 -0.8204684 0.3295078
# 4 a 4 0.4874291 -0.8204684
# 5 b 1 -0.6264538 NA
# 6 b 2 0.1836433 -0.6264538
# 7 b 3 -0.8356286 0.1836433
I have tried to use lag
directly:
data$lag.value <- lag(data$value)
...which clearly wouldn't work.
I have also tried:
unlist(tapply(data$value, data$groups, lag))
a1 a2 a3 a4 b1 b2 b3
NA -0.1162932 0.4420753 2.1505440 NA 0.5894583 -0.2890288
Which is almost what I want. However the vector generated is ordered differently from the ordering in the data.table which is problematic.
What is the most efficient way to do this in base R, plyr, dplyr, and data.table?