在我自己的包中使用 data.table 包

我试图在我自己的包中使用 data.table 包:

我创建了一个函数 test.fun,它只创建一个小型 data.table 对象,然后用“ A”列对“ Val”列分组求和。密码是

test.fun<-function ()
{
library(data.table)
testdata<-data.table(A=rep(seq(1,5), 5), Val=rnorm(25))
setkey(testdata, A)
res<-testdata[,{list(Ct=length(Val),Total=sum(Val),Avg=mean(Val))},"A"]
return(res)
}

当我在一个常规的 R 会话中创建这个函数,然后运行这个函数时,它会按照预期的方式工作。

> res<-test.fun()
data.table 1.8.0  For help type: help("data.table")
> res
A Ct      Total        Avg
[1,] 1  5 -0.5326444 -0.1065289
[2,] 2  5 -4.0832062 -0.8166412
[3,] 3  5  0.9458251  0.1891650
[4,] 4  5  2.0474791  0.4094958
[5,] 5  5  2.3609443  0.4721889

当我把这个函数放到一个包中,安装这个包,加载这个包,然后运行这个函数时,我得到一个错误消息。

> library(testpackage)
> res<-test.fun()
data.table 1.8.0  For help type: help("data.table")
Error in `[.data.frame`(x, i, j) : object 'Val' not found

有人能解释一下为什么会发生这种事,我能做些什么来弥补吗。非常感谢你的帮助。

12575 次浏览

Andrie's guess is right, +1. There is a FAQ on it (see vignette("datatable-faq")), as well as a new vignette on importing data.table:

FAQ 6.9: I have created a package that depends on data.table. How do I ensure my package is data.table-aware so that inheritance from data.frame works?

Either i) include data.table in the Depends: field of your DESCRIPTION file, or ii) include data.table in the Imports: field of your DESCRIPTION file AND import(data.table) in your NAMESPACE file.

Further background ... at the top of [.data.table (and other data.table functions), you'll see a switch depending on the result of a call to cedta(). This stands for Calling Environment Data Table Aware. Typing data.table:::cedta reveals how it's done. It relies on the calling package having a namespace, and, that namespace Import'ing or Depend'ing on data.table. This is how data.table can be passed to non-data.table2 packages (such as functions in base) and those packages can use absolutely standard [.data.frame syntax on the data.table, blissfully unaware that the data.frame data.table0 a data.table, too.

This is also why data.table inheritance didn't used to be compatible with namespaceless packages, and why upon user request we had to ask authors of such packages to add a namespace to their package to be compatible. Happily, now that R adds a default namespace for packages missing one (from v2.14.0), that problem has gone away :

CHANGES IN R VERSION 2.14.0
* All packages must have a namespace, and one is created on installation if not supplied in the sources.

Here is the complete recipe:

  1. Add data.table to Imports in your DESCRIPTION file.

  2. Add @import data.table to your respective .R file (i.e., the .R file that houses your function that's throwing the error Error in [.data.frame(x, i, j) : object 'Val' not found).

  3. Type library(devtools) and set your working directory to point at the main directory of your R package.

  4. Type document(). This will ensure that your NAMESPACE file includes a import(data.table) line.

  5. Type build()

  6. Type install()

For a nice primer on what build() and install() do, see: http://kbroman.org/pkg_primer/.

Then, once you close your R session and login next time, you can immediately jump right in with:

  1. Type library("my_R_package")

  2. Type the name of your function that's housed in the .R file mentioned above.

  3. Enjoy! You should no longer receive the dreaded Error in [.data.frame(x, i, j) : object 'Val' not found