How to group data.table by multiple columns?

I'm using the data.table package to speed up some summary statistic collection on a data set.

I'm curious if there's a way to group by more than one column. My data looks like this:

  purchaseAmt        adShown        url
15.54            00001         150000001
4.82            00002         150000001
157.99            05005         776300044
...               ...            ...

I can do something like this:

adShownMedian <- df1[,median(purchaseAmt),by="adShown"]

to get each ad's median. How would I do something that combines adShown and url?

I've tried this:

adShownMedian <- df1[,median(purchaseAmt),by=c("adShown","url")]

but no luck.

Any suggestions?

123052 次浏览

Use by=list(adShown,url) instead of by=c("adShown","url")

Example:

set.seed(007)
DF <- data.frame(X=1:20, Y=sample(c(0,1), 20, TRUE), Z=sample(0:5, 20, TRUE))


library(data.table)
DT <- data.table(DF)
DT[, Mean:=mean(X), by=list(Y, Z)]




X Y Z      Mean
1:  1 1 3  1.000000
2:  2 0 1  9.333333
3:  3 0 5  7.400000
4:  4 0 5  7.400000
5:  5 0 5  7.400000
6:  6 1 0  6.000000
7:  7 0 3  7.000000
8:  8 1 2 12.500000
9:  9 0 5  7.400000
10: 10 0 2 15.000000
11: 11 0 4 14.500000
12: 12 0 1  9.333333
13: 13 1 1 13.000000
14: 14 0 1  9.333333
15: 15 0 2 15.000000
16: 16 0 5  7.400000
17: 17 1 2 12.500000
18: 18 0 4 14.500000
19: 19 1 5 19.000000
20: 20 0 2 15.000000

To add on Jilber Urbina answer, and address kahlo comment:
if you want to get a single row for each Y - Z combination with the aggregated values you can do

DT[, .(X=mean(X)), by=list(Y, Z)]

that is the same as doing

DT[, .(X=mean(X)), by=.(Y, Z)]
# or
DT[, .(X=mean(X)), by=c('Y','Z')]
# or specify column names in vector
names = c('Y','Z')
DT[, .(X=mean(X)), by=names]

(data.table version 1.12.6)