将数据框架转换为 data.table 而不复制

我有一个很大的数据帧(大约几 GB) ,我想把它转换成 data.table。使用 as.data.table创建数据帧的副本,这意味着我需要可用的内存至少是数据大小的两倍。有没有办法在没有副本的情况下进行转换?

下面是一个简单的示例:

library(data.table)
N <- 1e6
K <- 1e2
data <- as.data.frame(rep(data.frame(rnorm(N)), K))


gc(reset=TRUE)
tracemem(data)
data <- as.data.table(data)
gc()

产出:

library(data.table)
# data.table 1.8.10  For help type: help("data.table")
N <- 1e6
K <- 1e2
data <- as.data.frame(rep(data.frame(rnorm(N)), K))


gc(reset=TRUE)
# used  (Mb) gc trigger   (Mb)  max used  (Mb)
# Ncells    303759  16.3     597831   32.0    303759  16.3
# Vcells 100442572 766.4  402928632 3074.2 100442572 766.4
tracemem(data)
# [1] "<0x363fda0>"
data <- as.data.table(data)
# tracemem[0x363fda0 -> 0x31e4260]: copy as.data.table.data.frame as.data.table
gc()
# used  (Mb) gc trigger   (Mb)  max used   (Mb)
# Ncells    304519  16.3     597831   32.0    306162   16.4
# Vcells 100444242 766.4  322342905 2459.3 200933219 1533.0
80082 次浏览

This is available from v1.9.0+. From NEWS:

o Following this S.O. post, a function setDT is now implemented that takes a list (named and/or unnamed), data.frame (or data.table) as input and returns the same object as a data.table by reference (without any copy). See ?setDT examples for more.

This is in accordance with data.table naming convention - all set* functions modifies by reference. := is the only other that also modifies by reference.

require(data.table) # v1.9.0+
setDT(data) # converts data which is a data.frame to data.table *by reference*

See history for older (now outdated) answers.