根据向量中的值从数据框架中选择行

我有类似的数据:

dt <- structure(list(fct = structure(c(1L, 2L, 3L, 4L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 2L, 3L, 4L), .Label = c("a", "b", "c", "d"), class = "factor"), X = c(2L, 4L, 3L, 2L, 5L, 4L, 7L, 2L, 9L, 1L, 4L, 2L, 5L, 4L, 2L)), .Names = c("fct", "X"), class = "data.frame", row.names = c(NA, -15L))

我想根据 fct变量中的值从这个数据帧中选择行。例如,如果我想选择包含“ a”或“ c”的行,我可以这样做:

dt[dt$fct == 'a' | dt$fct == 'c', ]

产生了

1    a 2
3    c 3
5    c 5
7    a 7
9    c 9
10   a 1
12   c 2
14   c 4

不出所料。但是我的实际数据更复杂,我实际上希望根据向量中的值选择行,比如

vc <- c('a', 'c')

所以我尽力了

dt[dt$fct == vc, ]

但是当然没用。我知道我可以编写一些代码来遍历向量,提取出所需的行并将它们附加到新的数据框架中,但我希望有一种更优雅的方式。

那么我如何根据向量 vc的内容来过滤/子集我的数据呢?

280300 次浏览

Have a look at ?"%in%".

dt[dt$fct %in% vc,]
fct X
1    a 2
3    c 3
5    c 5
7    a 7
9    c 9
10   a 1
12   c 2
14   c 4

You could also use ?is.element:

dt[is.element(dt$fct, vc),]

Similar to above, using filter from dplyr:

filter(df, fct %in% vc)

Another option would be to use a keyed data.table:

library(data.table)
setDT(dt, key = 'fct')[J(vc)]  # or: setDT(dt, key = 'fct')[.(vc)]

which results in:

   fct X
1:   a 2
2:   a 7
3:   a 1
4:   c 3
5:   c 5
6:   c 9
7:   c 2
8:   c 4

What this does:

  • setDT(dt, key = 'fct') transforms the data.frame to a data.table (which is an enhanced form of a data.frame) with the fct column set as key.
  • Next you can just subset with the vc vector with [J(vc)].

NOTE: when the key is a factor/character variable, you can also use setDT(dt, key = 'fct')[vc] but that won't work when vc is a numeric vector. When vc is a numeric vector and is not wrapped in J() or .(), vc will work as a rowindex.

A more detailed explanation of the concept of keys and subsetting can be found in the vignette Keys and fast binary search based subset.

An alternative as suggested by @Frank in the comments:

setDT(dt)[J(vc), on=.(fct)]

When vc contains values that are not present in dt, you'll need to add nomatch = 0:

setDT(dt, key = 'fct')[J(vc), nomatch = 0]

or:

setDT(dt)[J(vc), on=.(fct), nomatch = 0]