在常数平摊时间O(1)中将一个对象追加到R中的列表?

小开

这是一个简单的方法来添加项目到R列表:

# create an empty list:
small_list = list()


# now put some objects in it:
small_list$k1 = "v1"
small_list$k2 = "v2"
small_list$k3 = 1:10


# retrieve them the same way:
small_list$k1
# returns "v1"


# "index" notation works as well:
small_list["k2"]

或通过编程:

kx = paste(LETTERS[1:5], 1:5, sep="")
vx = runif(5)
lx = list()
cn = 1


for (itm in kx) { lx[itm] = vx[cn]; cn = cn + 1 }


print(length(lx))
# returns 5

小开

最佳答案

如果它是一个字符串列表，只需使用c()函数:

R> LL <- list(a="tom", b="dick")
R> c(LL, c="harry")
$a
[1] "tom"


$b
[1] "dick"


$c
[1] "harry"


R> class(LL)
[1] "list"
R>

这也适用于向量，我得到额外的分数了吗?

这篇文章即将迎来它的第五个生日。一些好心的读者一直在重复它的缺点，所以无论如何也要看看下面的一些评论。对于list类型的一个建议:

newlist <- list(oldlist, list(someobj))

一般来说，R类型使得所有类型和用途都很难有一个或只有一个习语。

小开

如果你将列表变量作为一个带引号的字符串传入，你可以像这样从函数内部到达它:

push <- function(l, x) {
assign(l, append(eval(as.name(l)), x), envir=parent.frame())
}

所以:

> a <- list(1,2)
> a
[[1]]
[1] 1


[[2]]
[1] 2


> push("a", 3)
> a
[[1]]
[1] 1


[[2]]
[1] 2


[[3]]
[1] 3


>

或者为了获得额外的学分:

> v <- vector()
> push("v", 1)
> v
[1] 1
> push("v", 2)
> v
[1] 1 2
>

小开

也许你想要这样的东西?

> push <- function(l, x) {
lst <- get(l, parent.frame())
lst[length(lst)+1] <- x
assign(l, lst, envir=parent.frame())
}
> a <- list(1,2)
> push('a', 6)
> a
[[1]]
[1] 1


[[2]]
[1] 2


[[3]]
[1] 6

这不是一个非常礼貌的函数(赋值给parent.frame()有点粗鲁)，但IIUYC它是你所要求的。

小开

不知道为什么你不认为你的第一种方法是行不通的。在lappend函数中有一个错误:length(list)应该是length(lst)。这很好，并返回一个带有附加obj的列表。

小开

> LL<-list(1:4)


> LL


[[1]]
[1] 1 2 3 4


> LL<-list(c(unlist(LL),5:9))


> LL


[[1]]
[1] 1 2 3 4 5 6 7 8 9

小开

我认为你要做的是实际上通过引用(指针)传递给函数——创建一个新环境(通过引用传递给函数)，并添加列表:

listptr=new.env(parent=globalenv())
listptr$list=mylist


#Then the function is modified as:
lPtrAppend <- function(lstptr, obj) {
lstptr$list[[length(lstptr$list)+1]] <- obj
}

现在您只是在修改现有的列表(而不是创建一个新的列表)

小开

在Lisp中，我们是这样做的:

> l <- c(1)
> l <- c(2, l)
> l <- c(3, l)
> l <- rev(l)
> l
[1] 1 2 3

虽然是“cons”，而不仅仅是“c”。如果你需要从一个空列表开始，使用l <- NULL。

小开

试试这个函数lappend

lappend <- function (lst, ...){
lst <- c(lst, list(...))
return(lst)
}

和来自本页将命名向量添加到列表中的其他建议

再见。

小开

事实上，c()函数有一个微妙之处。如果你有:

x <- list()
x <- c(x,2)
x = c(x,"foo")

如你所料，你将获得:

[[1]]
[1]


[[2]]
[1] "foo"

但是如果你用x <- c(x, matrix(5,2,2)添加一个矩阵，你的列表将有另外4个值为5的元素! 你最好这样做:

x <- c(x, list(matrix(5,2,2))

它适用于任何其他对象，你将获得预期的:

[[1]]
[1]


[[2]]
[1] "foo"


[[3]]
[,1] [,2]
[1,]    5    5
[2,]    5    5

最后，你的函数变成:

push <- function(l, ...) c(l, list(...))

它适用于任何类型的对象。你可以更聪明地去做:

push_back <- function(l, ...) c(l, list(...))
push_front <- function(l, ...) c(list(...), l)

小开

我对这里提到的方法做了一个小的比较。

n = 1e+4
library(microbenchmark)
### Using environment as a container
lPtrAppend <- function(lstptr, lab, obj) {lstptr[[deparse(substitute(lab))]] <- obj}
### Store list inside new environment
envAppendList <- function(lstptr, obj) {lstptr$list[[length(lstptr$list)+1]] <- obj}


microbenchmark(times = 5,
env_with_list_ = {
listptr <- new.env(parent=globalenv())
listptr$list <- NULL
for(i in 1:n) {envAppendList(listptr, i)}
listptr$list
},
c_ = {
a <- list(0)
for(i in 1:n) {a = c(a, list(i))}
},
list_ = {
a <- list(0)
for(i in 1:n) {a <- list(a, list(i))}
},
by_index = {
a <- list(0)
for(i in 1:n) {a[length(a) + 1] <- i}
a
},
append_ = {
a <- list(0)
for(i in 1:n) {a <- append(a, i)}
a
},
env_as_container_ = {
listptr <- new.env(parent=globalenv())
for(i in 1:n) {lPtrAppend(listptr, i, i)}
listptr
}
)

结果:

Unit: milliseconds
expr       min        lq       mean    median        uq       max neval cld
env_with_list_  188.9023  198.7560  224.57632  223.2520  229.3854  282.5859     5  a
c_ 1275.3424 1869.1064 2022.20984 2191.7745 2283.1199 2491.7060     5   b
list_   17.4916   18.1142   22.56752   19.8546   20.8191   36.5581     5  a
by_index  445.2970  479.9670  540.20398  576.9037  591.2366  607.6156     5  a
append_ 1140.8975 1316.3031 1794.10472 1620.1212 1855.3602 3037.8416     5   b
env_as_container_  355.9655  360.1738  399.69186  376.8588  391.7945  513.6667     5  a

小开

OP(在2012年4月问题的更新修订中)感兴趣的是是否有一种方法可以在平摊常数时间内添加到列表中，例如，可以使用c++ vector<>容器。到目前为止，这里的最佳答案只显示了给定固定大小问题的各种解决方案的相对执行时间，但没有直接解决任何各种解决方案的算法的效率。许多答案下面的评论讨论了一些解决方案的算法效率，但是in every case to date(截至2015年4月)他们得出了错误的结论

算法效率捕获增长特征，无论是在时间(执行时间)或空间(内存消耗量)随着问题大小的增长。给定一个固定大小的问题，为各种解决方案运行性能测试并不能解决各种解决方案的增长速度。OP感兴趣的是知道是否有一种方法可以在“平摊常数时间”中将对象追加到R列表中。这是什么意思?为了解释，首先让我描述一下“常数时间”:

常数或O (1)增长:

如果执行给定任务保持不变所需的时间是问题双打的大小，那么我们说算法显示常数时间增长，或者用“大O”符号表示，显示O(1)时间增长。当OP说“平摊”常数时间时，他只是指“从长远来看”……例如，如果执行单个操作偶尔需要比正常情况下更长的时间(例如，如果预先分配的缓冲区耗尽，偶尔需要调整缓冲区大小到更大的缓冲区大小)，只要长期平均性能是常数时间，我们仍然称其为O(1)。

为了进行比较，我还将描述“线性时间”和“二次时间”:

线性或O (n)增长:

如果执行给定任务所需的时间双打作为问题的大小双打，那么我们说算法显示线性时间，或O (n)增长

二次或O (n <一口> 2 > < /晚餐)增长:

如果执行给定任务所需的时间按问题大小的平方增加，则我们说该算法显示二次时间或O (n <一口> 2 > < /晚餐)增长

还有很多其他效率算法;我引用维基百科的文章进行进一步讨论。

我感谢@CronAcronis的回答，因为我是R的新手，很高兴有一个完整的代码块来对本页上提供的各种解决方案进行性能分析。我借用了他的代码用于我的分析，我复制了下面(包装在一个函数中):

library(microbenchmark) ### Using environment as a container lPtrAppend <- function(lstptr, lab, obj) {lstptr[[deparse(substitute(lab))]] <- obj} ### Store list inside new environment envAppendList <- function(lstptr, obj) {lstptr$list[[length(lstptr$list)+1]] <- obj} runBenchmark <- function(n) { microbenchmark(times = 5, env_with_list_ = { listptr <- new.env(parent=globalenv()) listptr$list <- NULL for(i in 1:n) {envAppendList(listptr, i)} listptr$list }, c_ = { a <- list(0) for(i in 1:n) {a = c(a, list(i))} }, list_ = { a <- list(0) for(i in 1:n) {a <- list(a, list(i))} }, by_index = { a <- list(0) for(i in 1:n) {a[length(a) + 1] <- i} a }, append_ = { a <- list(0) for(i in 1:n) {a <- append(a, i)} a }, env_as_container_ = { listptr <- new.env(parent=globalenv()) for(i in 1:n) {lPtrAppend(listptr, i, i)} listptr } ) }

@CronAcronis发布的结果显然表明a <- list(a, list(i))方法是最快的，至少对于10000大小的问题是这样，但是对于单个问题大小的结果并没有解决解决方案的增长。为此，我们需要针对不同的问题大小运行至少两个概要测试:

> runBenchmark(2e+3) Unit: microseconds expr min lq mean median uq max neval env_with_list_ 8712.146 9138.250 10185.533 10257.678 10761.33 12058.264 5 c_ 13407.657 13413.739 13620.976 13605.696 13790.05 13887.738 5 list_ 854.110 913.407 1064.463 914.167 1301.50 1339.132 5 by_index 11656.866 11705.140 12182.104 11997.446 12741.70 12809.363 5 append_ 15986.712 16817.635 17409.391 17458.502 17480.55 19303.560 5 env_as_container_ 19777.559 20401.702 20589.856 20606.961 20939.56 21223.502 5 > runBenchmark(2e+4) Unit: milliseconds expr min lq mean median uq max neval env_with_list_ 534.955014 550.57150 550.329366 553.5288 553.955246 558.636313 5 c_ 1448.014870 1536.78905 1527.104276 1545.6449 1546.462877 1558.609706 5 list_ 8.746356 8.79615 9.162577 8.8315 9.601226 9.837655 5 by_index 953.989076 1038.47864 1037.859367 1064.3942 1065.291678 1067.143200 5 append_ 1634.151839 1682.94746 1681.948374 1689.7598 1696.198890 1706.683874 5 env_as_container_ 204.134468 205.35348 208.011525 206.4490 208.279580 215.841129 5 >

首先，关于min/lq/mean/median/uq/max值:由于我们在5次运行中每次执行完全相同的任务，在理想的情况下，我们可以期望每次运行花费完全相同的时间。但是第一次运行通常会花费更长的时间，因为我们正在测试的代码还没有加载到CPU的缓存中。在第一次运行之后，我们期望时间是相当一致的，但偶尔我们的代码可能会由于计时器滴答中断或其他与我们正在测试的代码无关的硬件中断而从缓存中被清除。通过测试代码段5次，我们允许在第一次运行时将代码加载到缓存中，然后给每个代码段4次机会在不受外部事件干扰的情况下运行到完成。由于这个原因，并且因为我们每次都在完全相同的输入条件下运行完全相同的代码，所以我们将只考虑“最小”次数，以便在各种代码选项之间进行最佳比较。

请注意，我选择首先运行问题大小为2000，然后运行问题大小为20000，因此我的问题大小从第一次运行到第二次运行增加了10倍。

list解决方案的性能:O(1)(常数时间)

让我们先来看看list解决方案的增长，因为我们可以马上看出它是两次分析运行中最快的解决方案:在第一次运行中，它花了854 __abc3秒(0.854 __abc4秒)来执行2000个“追加”任务。在第二次运行中，执行20000个“追加”任务花费了8.746毫秒。naïve观察者会说，这个分析的问题是，OP想要的是单个对象插入的增长率，而不是整体问题的增长率。知道了这一点，很明显list解决方案提供了OP想要的东西:在O(1)时间内将对象追加到列表的方法。

其他解决方案的性能

没有一个其他的解决方案甚至接近list解决方案的速度，但无论如何检查它们都是有信息的:

大多数其他解决方案在性能上似乎是O(n)。例如，by_index解决方案，一个非常流行的解决方案，基于我在其他SO帖子中发现它的频率，它需要11.6毫秒来追加2000个对象，而953毫秒来追加十倍于它的对象。整个问题的时间增长了100倍，所以naïve观察者可能会说“啊，by_index解决方案表现出O(n²)增长，因为当问题大小增长到10倍时，执行测试所需的时间增长到100倍。”和前面一样，这个分析是有缺陷的，因为OP只对单个对象插入的增长感兴趣。如果我们将总体时间增长除以问题的规模增长，我们会发现附加对象的时间增长仅增加了10倍，而不是100倍，这与问题规模的增长相匹配，因此by_index解决方案是O(n)。列出的解决方案中没有显示附加单个对象的O(n²)增长。

小开

在其他答案中，只有list方法会导致O(1)追加，但它会导致深度嵌套的列表结构，而不是普通的单个列表。我使用了下面的数据结构，它们支持O(1)(平摊)追加，并允许结果转换回一个普通列表。

expandingList <- function(capacity = 10) { buffer <- vector('list', capacity) length <- 0 methods <- list() methods$double.size <- function() { buffer <<- c(buffer, vector('list', capacity)) capacity <<- capacity * 2 } methods$add <- function(val) { if(length == capacity) { methods$double.size() } length <<- length + 1 buffer[[length]] <<- val } methods$as.list <- function() { b <- buffer[0:length] return(b) } methods }

而且

linkedList <- function() { head <- list(0) length <- 0 methods <- list() methods$add <- function(val) { length <<- length + 1 head <<- list(head, val) } methods$as.list <- function() { b <- vector('list', length) h <- head for(i in length:1) { b[[i]] <- head[[2]] head <- head[[1]] } return(b) } methods }

使用方法如下:

> l <- expandingList() > l$add("hello") > l$add("world") > l$add(101) > l$as.list() [[1]] [1] "hello" [[2]] [1] "world" [[3]] [1] 101

这些解决方案可以扩展为支持所有列表相关操作的完整对象，但这仍将作为读者的练习。

命名列表的另一种变体:

namedExpandingList <- function(capacity = 10) { buffer <- vector('list', capacity) names <- character(capacity) length <- 0 methods <- list() methods$double.size <- function() { buffer <<- c(buffer, vector('list', capacity)) names <<- c(names, character(capacity)) capacity <<- capacity * 2 } methods$add <- function(name, val) { if(length == capacity) { methods$double.size() } length <<- length + 1 buffer[[length]] <<- val names[length] <<- name } methods$as.list <- function() { b <- buffer[0:length] names(b) <- names[0:length] return(b) } methods }

基准

使用@phonetagger的代码(基于@Cron Arconis的代码)进行性能比较。我还添加了一个better_env_as_container，并稍微改变了env_as_container_。原来的env_as_container_被破坏了，实际上并没有存储所有的数字。

library(microbenchmark) lPtrAppend <- function(lstptr, lab, obj) {lstptr[[deparse(lab)]] <- obj} ### Store list inside new environment envAppendList <- function(lstptr, obj) {lstptr$list[[length(lstptr$list)+1]] <- obj} env2list <- function(env, len) { l <- vector('list', len) for (i in 1:len) { l[[i]] <- env[[as.character(i)]] } l } envl2list <- function(env, len) { l <- vector('list', len) for (i in 1:len) { l[[i]] <- env[[paste(as.character(i), 'L', sep='')]] } l } runBenchmark <- function(n) { microbenchmark(times = 5, env_with_list_ = { listptr <- new.env(parent=globalenv()) listptr$list <- NULL for(i in 1:n) {envAppendList(listptr, i)} listptr$list }, c_ = { a <- list(0) for(i in 1:n) {a = c(a, list(i))} }, list_ = { a <- list(0) for(i in 1:n) {a <- list(a, list(i))} }, by_index = { a <- list(0) for(i in 1:n) {a[length(a) + 1] <- i} a }, append_ = { a <- list(0) for(i in 1:n) {a <- append(a, i)} a }, env_as_container_ = { listptr <- new.env(hash=TRUE, parent=globalenv()) for(i in 1:n) {lPtrAppend(listptr, i, i)} envl2list(listptr, n) }, better_env_as_container = { env <- new.env(hash=TRUE, parent=globalenv()) for(i in 1:n) env[[as.character(i)]] <- i env2list(env, n) }, linkedList = { a <- linkedList() for(i in 1:n) { a$add(i) } a$as.list() }, inlineLinkedList = { a <- list() for(i in 1:n) { a <- list(a, i) } b <- vector('list', n) head <- a for(i in n:1) { b[[i]] <- head[[2]] head <- head[[1]] } }, expandingList = { a <- expandingList() for(i in 1:n) { a$add(i) } a$as.list() }, inlineExpandingList = { l <- vector('list', 10) cap <- 10 len <- 0 for(i in 1:n) { if(len == cap) { l <- c(l, vector('list', cap)) cap <- cap*2 } len <- len + 1 l[[len]] <- i } l[1:len] } ) } # We need to repeatedly add an element to a list. With normal list concatenation # or element setting this would lead to a large number of memory copies and a # quadratic runtime. To prevent that, this function implements a bare bones # expanding array, in which list appends are (amortized) constant time. expandingList <- function(capacity = 10) { buffer <- vector('list', capacity) length <- 0 methods <- list() methods$double.size <- function() { buffer <<- c(buffer, vector('list', capacity)) capacity <<- capacity * 2 } methods$add <- function(val) { if(length == capacity) { methods$double.size() } length <<- length + 1 buffer[[length]] <<- val } methods$as.list <- function() { b <- buffer[0:length] return(b) } methods } linkedList <- function() { head <- list(0) length <- 0 methods <- list() methods$add <- function(val) { length <<- length + 1 head <<- list(head, val) } methods$as.list <- function() { b <- vector('list', length) h <- head for(i in length:1) { b[[i]] <- head[[2]] head <- head[[1]] } return(b) } methods } # We need to repeatedly add an element to a list. With normal list concatenation # or element setting this would lead to a large number of memory copies and a # quadratic runtime. To prevent that, this function implements a bare bones # expanding array, in which list appends are (amortized) constant time. namedExpandingList <- function(capacity = 10) { buffer <- vector('list', capacity) names <- character(capacity) length <- 0 methods <- list() methods$double.size <- function() { buffer <<- c(buffer, vector('list', capacity)) names <<- c(names, character(capacity)) capacity <<- capacity * 2 } methods$add <- function(name, val) { if(length == capacity) { methods$double.size() } length <<- length + 1 buffer[[length]] <<- val names[length] <<- name } methods$as.list <- function() { b <- buffer[0:length] names(b) <- names[0:length] return(b) } methods }

结果:

> runBenchmark(1000) Unit: microseconds expr min lq mean median uq max neval env_with_list_ 3128.291 3161.675 4466.726 3361.837 3362.885 9318.943 5 c_ 3308.130 3465.830 6687.985 8578.913 8627.802 9459.252 5 list_ 329.508 343.615 389.724 370.504 449.494 455.499 5 by_index 3076.679 3256.588 5480.571 3395.919 8209.738 9463.931 5 append_ 4292.321 4562.184 7911.882 10156.957 10202.773 10345.177 5 env_as_container_ 24471.511 24795.849 25541.103 25486.362 26440.591 26511.200 5 better_env_as_container 7671.338 7986.597 8118.163 8153.726 8335.659 8443.493 5 linkedList 1700.754 1755.439 1829.442 1804.746 1898.752 1987.518 5 inlineLinkedList 1109.764 1115.352 1163.751 1115.631 1206.843 1271.166 5 expandingList 1422.440 1439.970 1486.288 1519.728 1524.268 1525.036 5 inlineExpandingList 942.916 973.366 1002.461 1012.197 1017.784 1066.044 5 > runBenchmark(10000) Unit: milliseconds expr min lq mean median uq max neval env_with_list_ 357.760419 360.277117 433.810432 411.144799 479.090688 560.779139 5 c_ 685.477809 734.055635 761.689936 745.957553 778.330873 864.627811 5 list_ 3.257356 3.454166 3.505653 3.524216 3.551454 3.741071 5 by_index 445.977967 454.321797 515.453906 483.313516 560.374763 633.281485 5 append_ 610.777866 629.547539 681.145751 640.936898 760.570326 763.896124 5 env_as_container_ 281.025606 290.028380 303.885130 308.594676 314.972570 324.804419 5 better_env_as_container 83.944855 86.927458 90.098644 91.335853 92.459026 95.826030 5 linkedList 19.612576 24.032285 24.229808 25.461429 25.819151 26.223597 5 inlineLinkedList 11.126970 11.768524 12.216284 12.063529 12.392199 13.730200 5 expandingList 14.735483 15.854536 15.764204 16.073485 16.075789 16.081726 5 inlineExpandingList 10.618393 11.179351 13.275107 12.391780 14.747914 17.438096 5 > runBenchmark(20000) Unit: milliseconds expr min lq mean median uq max neval env_with_list_ 1723.899913 1915.003237 1921.23955 1938.734718 1951.649113 2076.910767 5 c_ 2759.769353 2768.992334 2810.40023 2820.129738 2832.350269 2870.759474 5 list_ 6.112919 6.399964 6.63974 6.453252 6.910916 7.321647 5 by_index 2163.585192 2194.892470 2292.61011 2209.889015 2436.620081 2458.063801 5 append_ 2832.504964 2872.559609 2983.17666 2992.634568 3004.625953 3213.558197 5 env_as_container_ 573.386166 588.448990 602.48829 597.645221 610.048314 642.912752 5 better_env_as_container 154.180531 175.254307 180.26689 177.027204 188.642219 206.230191 5 linkedList 38.401105 47.514506 46.61419 47.525192 48.677209 50.952958 5 inlineLinkedList 25.172429 26.326681 32.33312 34.403442 34.469930 41.293126 5 expandingList 30.776072 30.970438 34.45491 31.752790 38.062728 40.712542 5 inlineExpandingList 21.309278 22.709159 24.64656 24.290694 25.764816 29.158849 5

我已经添加了linkedList和expandingList以及两者的内联版本。inlinedLinkedList基本上是list_的副本，但它也将嵌套结构转换回普通列表。除此之外，内联版本和非内联版本之间的差异是由于函数调用的开销。

expandingList和linkedList的所有变体都显示O(1)附加性能，基准时间随附加项的数量线性缩放。linkedList比expandingList慢，而且函数调用开销也是可见的。所以如果你真的需要所有你能得到的速度(并想坚持R代码)，使用expandingList的内联版本。

我还研究了R的C实现，这两种方法都应该是O(1)追加任何大小，直到耗尽内存。

我还更改了env_as_container_，原始版本将在索引“I”下存储所有项，覆盖先前追加的项。我添加的better_env_as_container非常类似于env_as_container_，但没有deparse的东西。两者都表现出O(1)性能，但它们的开销比链接/展开列表要大得多。

内存开销

在C R实现中，每个分配对象的开销为4个单词和2个int。linkedList方法为每个追加分配一个长度为2的列表，在64位计算机上，每个追加项的总数为(4*8+4+4+2*8=)56字节(不包括内存分配开销，因此可能接近64字节)。expandingList方法对每个附加项使用一个单词，当向量长度翻倍时再加上一个副本，因此每个项的总内存使用量最高可达16字节。由于内存全部在一个或两个对象中，因此每个对象的开销是微不足道的。我还没有深入研究env的内存使用情况，但我认为它将更接近linkedList。

小开

这是一个非常有趣的问题，我希望我下面的想法可以为解决这个问题提供一种方式。这个方法给出了一个没有索引的平面列表，但是它有列表和反列表来避免嵌套结构。我不确定速度，因为我不知道如何基准。

a_list<-list() for(i in 1:3){ a_list<-list(unlist(list(unlist(a_list,recursive = FALSE),list(rnorm(2))),recursive = FALSE)) } a_list [[1]] [[1]][[1]] [1] -0.8098202 1.1035517 [[1]][[2]] [1] 0.6804520 0.4664394 [[1]][[3]] [1] 0.15592354 0.07424637

小开

还有来自rlist (链接到文档)的list.append

require(rlist) LL <- list(a="Tom", b="Dick") list.append(LL,d="Pam",f=c("Joe","Ann"))

这非常简单和有效。

小开

为了验证，我运行了@Cron提供的基准测试代码。有一个主要的区别(除了在更新的i7处理器上运行更快之外):by_index现在的性能几乎与list_一样好:

Unit: milliseconds expr min lq mean median uq env_with_list_ 167.882406 175.969269 185.966143 181.817187 185.933887 c_ 485.524870 501.049836 516.781689 518.637468 537.355953 list_ 6.155772 6.258487 6.544207 6.269045 6.290925 by_index 9.290577 9.630283 9.881103 9.672359 10.219533 append_ 505.046634 543.319857 542.112303 551.001787 553.030110 env_as_container_ 153.297375 154.880337 156.198009 156.068736 156.800135

这里是从@Cron的回答中逐字复制的基准代码(以防他后来更改内容):

n = 1e+4 library(microbenchmark) ### Using environment as a container lPtrAppend <- function(lstptr, lab, obj) {lstptr[[deparse(substitute(lab))]] <- obj} ### Store list inside new environment envAppendList <- function(lstptr, obj) {lstptr$list[[length(lstptr$list)+1]] <- obj} microbenchmark(times = 5, env_with_list_ = { listptr <- new.env(parent=globalenv()) listptr$list <- NULL for(i in 1:n) {envAppendList(listptr, i)} listptr$list }, c_ = { a <- list(0) for(i in 1:n) {a = c(a, list(i))} }, list_ = { a <- list(0) for(i in 1:n) {a <- list(a, list(i))} }, by_index = { a <- list(0) for(i in 1:n) {a[length(a) + 1] <- i} a }, append_ = { a <- list(0) for(i in 1:n) {a <- append(a, i)} a }, env_as_container_ = { listptr <- new.env(parent=globalenv()) for(i in 1:n) {lPtrAppend(listptr, i, i)} listptr } )

小开

我运行了以下基准测试:

bench=function(...,n=1,r=3){ a=match.call(expand.dots=F)$... t=matrix(ncol=length(a),nrow=n) for(i in 1:length(a))for(j in 1:n){t1=Sys.time();eval(a[[i]],parent.frame());t[j,i]=Sys.time()-t1} o=t(apply(t,2,function(x)c(median(x),min(x),max(x),mean(x)))) round(1e3*`dimnames<-`(o,list(names(a),c("median","min","max","mean"))),r) } ns=10^c(3:7) m=sapply(ns,function(n)bench(n=5, `vector at length + 1`={l=c();for(i in 1:n)l[length(l)+1]=i}, `vector at index`={l=c();for(i in 1:n)l[i]=i}, `vector at index, initialize with type`={l=integer();for(i in 1:n)l[i]=i}, `vector at index, initialize with length`={l=vector(length=n);for(i in 1:n)l[i]=i}, `vector at index, initialize with type and length`={l=integer(n);for(i in 1:n)l[i]=i}, `list at length + 1`={l=list();for(i in 1:n)l[[length(l)+1]]=i}, `list at index`={l=list();for(i in 1:n)l[[i]]=i}, `list at index, initialize with length`={l=vector('list',n);for(i in 1:n)l[[i]]=i}, `list at index, initialize with double length, remove null`={l=vector("list",2*n);for(i in 1:n)l[[i]]=i;l=head(l,i)}, `list at index, double when full, get length from variable`={len=1;l=list();for(i in 1:n){l[[i]]=i;if(i==len){len=len*2;length(l)=len}};l=head(l,i)}, `list at index, double when full, check length inside loop`={len=1;l=list();for(i in 1:n){l[[i]]=i;if(i==length(l)){length(l)=i*2}};l=head(l,i)}, `nested lists`={l=list();for(i in 1:n)l=list(l,i)}, `nested lists with unlist`={if(n<=1e5){l=list();for(i in 1:n)l=list(l,i);o=unlist(l)}}, `nested lists with manual unlist`={l=list();for(i in 1:n)l=list(l,i);o=integer(n);for(i in 1:n){o[n-i+1]=l[[2]];l=l[[1]]}}, `JanKanis better_env_as_container`={env=new.env(hash=T,parent=globalenv());for(i in 1:n)env[[as.character(i)]]=i}, `JanKanis inlineLinkedList`={a=list();for(i in 1:n)a=list(a,i);b=vector('list',n);head=a;for(i in n:1){b[[i]]=head[[2]];head=head[[1]]}}, `JanKanis inlineExpandingList`={l=vector('list',10);cap=10;len=0;for(i in 1:n){if(len==cap){l=c(l,vector('list',cap));cap=cap*2};len=len+1;l[[len]]=i};l[1:len]}, `c`={if(n<=1e5){l=c();for(i in 1:n)l=c(l,i)}}, `append vector`={if(n<=1e5){l=integer(n);for(i in 1:n)l=append(l,i)}}, `append list`={if(n<=1e9){l=list();for(i in 1:n)l=append(l,i)}} )[,1]) m[rownames(m)%in%c("nested lists with unlist","c","append vector","append list"),4:5]=NA m2=apply(m,2,function(x)formatC(x,max(0,2-ceiling(log10(min(x,na.rm=T)))),format="f")) m3=apply(rbind(paste0("1e",log10(ns)),m2),2,function(x)formatC(x,max(nchar(x)),format="s")) writeLines(apply(cbind(m3,c("",rownames(m))),1,paste,collapse=" "))

输出:

1e3 1e4 1e5 1e6 1e7 2.35 24.5 245 2292 27146 vector at length + 1 0.61 5.9 60 590 7360 vector at index 0.61 5.9 64 587 7132 vector at index, initialize with type 0.56 5.6 54 523 6418 vector at index, initialize with length 0.54 5.5 55 522 6371 vector at index, initialize with type and length 2.65 28.8 299 3955 48204 list at length + 1 0.93 9.2 96 1605 13480 list at index 0.58 5.6 57 707 8461 list at index, initialize with length 0.62 5.8 59 739 9413 list at index, initialize with double length, remove null 0.88 8.4 81 962 11872 list at index, double when full, get length from variable 0.96 9.5 92 1264 15813 list at index, double when full, check length inside loop 0.21 1.9 22 426 3826 nested lists 0.25 2.4 29 NA NA nested lists with unlist 2.85 27.5 295 3065 31427 nested lists with manual unlist 1.65 20.2 293 6505 8835 JanKanis better_env_as_container 1.11 10.1 110 1534 27119 JanKanis inlineLinkedList 2.66 26.3 266 3592 47120 JanKanis inlineExpandingList 1.22 118.6 15466 NA NA c 3.64 512.0 45167 NA NA append vector 6.35 664.8 71399 NA NA append list

上表显示的是每种方法的中值时间，而不是平均时间，因为有时单个运行的时间比典型运行的时间长得多，这会扭曲平均运行时间。但是在第一次运行之后的后续运行中，没有一种方法变得更快，因此每种方法的最小时间和中值时间通常是相似的。

方法“vector at index”;(l=c();for(i in 1:n)l[i]=i)大约比“vector at length + 1”快5倍;(l=c();for(i in 1:n)l[length(l)]=i)，因为获取向量的长度比向向量中添加一个元素花费的时间要长。当我用预定的长度初始化vector时，它使代码快了20%左右，但用特定类型初始化并没有什么不同，因为当向vector添加第一个项时，类型只需要更改一次。在列表的情况下，当你比较方法&;list at index&;而“list at index initialized with length”，当列表长度增加时，用预定的长度初始化列表会产生更大的差异，因为它使代码在长度为1e6时速度提高了两倍，而在长度为1e7时速度提高了三倍。

方法“list at index”;(l=list();for(i in 1:n)l[[i]]=i)比方法“list at length + 1”快3-4倍;(l=list();for(i in 1:n)l[[length(l)+1]]=i)。

JanKanis的链表和扩展列表方法比“list at index”慢。但比“list at length + 1”快。链表比扩展表快。

有些人声称append函数比c函数快，但在我的基准测试中，append比c慢大约3-4倍。

在上表中，长度1e6和1e7和在三个方法中缺失:for "c" append vector"和"append list"因为它们有二次的时间复杂度，并且对于“嵌套列表与unlist”;因为这会导致堆栈溢出。

“嵌套列表”;选项是最快的，但它不包括平展列表所需的时间。当我使用unlist函数来平展嵌套列表时，当列表的长度约为1.26e5或更高时，我得到了堆栈溢出，因为unlist函数在默认情况下递归地调用自己:n=1.26e5;l=list();for(i in 1:n)l=list(l,list(i));u=unlist(l)。当我反复调用unlist(recursive=F)时，即使对于只有10,000项的列表for(i in 1:n)l=unlist(l,recursive=F)也需要大约4秒才能运行。但是当我手动取消列表时，只花了大约0.3秒就运行了一个包含一百万个条目的列表:o=integer(n);for(i in 1:n){o[n-i+1]=l[[2]];l=l[[1]]}。

如果您事先不知道要向列表添加多少项，但知道最大项数，则可以尝试以最大长度初始化列表，然后删除NULL值。或者另一种方法是每次列表满时将列表的大小增加一倍(如果你有一个变量用于列表的长度，另一个变量用于添加到列表中的项的数量，那么你可以更快地完成，这样你就不必在每次循环迭代时检查列表对象的长度):

ns=10^c(2:7) m=sapply(ns,function(n)bench(n=5, `list at index`={l=list();for(i in 1:n)l[[i]]=i}, `list at length + 1`={l=list();for(i in 1:n)l[[length(l)+1]]=i}, `list at index, initialize with length`={l=vector("list",n);for(i in 1:n)l[[i]]=i}, `list at index, initialize with double length, remove null`={l=vector("list",2*n);for(i in 1:n)l[[i]]=i;l=head(l,i)}, `list at index, initialize with length 1e7, remove null`={l=vector("list",1e7);for(i in 1:n)l[[i]]=i;l=head(l,i)}, `list at index, initialize with length 1e8, remove null`={l=vector("list",1e8);for(i in 1:n)l[[i]]=i;l=head(l,i)}, `list at index, double when full, get length from variable`={len=1;l=list();for(i in 1:n){l[[i]]=i;if(i==len){len=len*2;length(l)=len}};l=head(l,i)}, `list at index, double when full, check length inside loop`={len=1;l=list();for(i in 1:n){l[[i]]=i;if(i==length(l)){length(l)=i*2}};l=head(l,i)} )[,1]) m2=apply(m,2,function(x)formatC(x,max(0,2-ceiling(log10(min(x)))),format="f")) m3=apply(rbind(paste0("1e",log10(ns)),m2),2,function(x)formatC(x,max(nchar(x)),format="s")) writeLines(apply(cbind(m3,c("",rownames(m))),1,paste,collapse=" "))

输出:

1e4 1e5 1e6 1e7 9.3 102 1225 13250 list at index 27.4 315 3820 45920 list at length + 1 5.7 58 726 7548 list at index, initialize with length 5.8 60 748 8057 list at index, initialize with double length, remove null 33.4 88 902 7684 list at index, initialize with length 1e7, remove null 333.2 393 2691 12245 list at index, initialize with length 1e8, remove null 8.6 83 1032 10611 list at index, double when full, get length from variable 9.3 96 1280 14319 list at index, double when full, check length inside loop