使用 mutate_at 创建新变量,同时保留原始变量

考虑一下这个简单的例子:

library(dplyr)


dataframe <- data_frame(helloo = c(1,2,3,4,5,6),
ooooHH = c(1,1,1,2,2,2),
ahaaa = c(200,400,120,300,100,100))


# A tibble: 6 x 3
helloo ooooHH ahaaa
<dbl>  <dbl> <dbl>
1      1      1   200
2      2      1   400
3      3      1   120
4      4      2   300
5      5      2   100
6      6      2   100

这里我想对包含 oo的所有列应用函数 ntile,但是我希望这些新列被称为 cat + 相应的列。

我知道我能做到

dataframe %>% mutate_at(vars(contains('oo')), .funs = funs(ntile(., 2)))
# A tibble: 6 x 3
helloo ooooHH ahaaa
<int>  <int> <dbl>
1      1      1   200
2      1      1   400
3      1      1   120
4      2      2   300
5      2      2   100
6      2      2   100

但我需要的是这个

# A tibble: 8 x 5
helloo   ooooHH   ahaaa cat_helloo cat_ooooHH
<dbl>    <dbl> <dbl>    <int>    <int>
1        1        1   200        1        1
2        2        1   400        1        1
3        3        1   120        1        1
4        4        2   300        2        2
5        5        2   100        2        2
6        5        2   100        2        2
7        6        2   100        2        2
8        6        2   100        2        2

是否有不需要存储中间数据并合并回原始数据框架的解决方案?

32409 次浏览

Update 2020-06 for dplyr 1.0.0

Starting in dplyr 1.0.0, the across() function supersedes the "scoped variants" of functions such as mutate_at(). The code should look pretty familiar within across(), which is nested inside mutate().

Adding a name to the function(s) you give in the list adds the function name as a suffix.

dataframe %>%
mutate( across(contains('oo'),
.fns = list(cat = ~ntile(., 2))) )


# A tibble: 6 x 5
helloo ooooHH ahaaa helloo_cat ooooHH_cat
<dbl>  <dbl> <dbl>      <int>      <int>
1      1      1   200          1          1
2      2      1   400          1          1
3      3      1   120          1          1
4      4      2   300          2          2
5      5      2   100          2          2
6      6      2   100          2          2

Changing the new columns names is a little easier in 1.0.0 with the .names argument in across(). Here is an example of adding the function name as a prefix instead of a suffix. This uses glue syntax.

dataframe %>%
mutate( across(contains('oo'),
.fns = list(cat = ~ntile(., 2)),
.names = "{fn}_{col}" ) )


# A tibble: 6 x 5
helloo ooooHH ahaaa cat_helloo cat_ooooHH
<dbl>  <dbl> <dbl>      <int>      <int>
1      1      1   200          1          1
2      2      1   400          1          1
3      3      1   120          1          1
4      4      2   300          2          2
5      5      2   100          2          2
6      6      2   100          2          2

Original answer with mutate_at()

Edited to reflect changes in dplyr. As of dplyr 0.8.0, funs() is deprecated and list() with ~ should be used instead.

You can give names to the functions to the list you pass to .funs to make new variables with the names as suffixes attached.

dataframe %>% mutate_at(vars(contains('oo')), .funs = list(cat = ~ntile(., 2)))


# A tibble: 6 x 5
helloo ooooHH ahaaa helloo_cat ooooHH_cat
<dbl>  <dbl> <dbl>      <int>      <int>
1      1      1   200          1          1
2      2      1   400          1          1
3      3      1   120          1          1
4      4      2   300          2          2
5      5      2   100          2          2
6      6      2   100          2          2

If you want it as a prefix instead, you could then use rename_at to change the names.

dataframe %>%
mutate_at(vars(contains('oo')), .funs = list(cat = ~ntile(., 2))) %>%
rename_at( vars( contains( "_cat") ), list( ~paste("cat", gsub("_cat", "", .), sep = "_") ) )


# A tibble: 6 x 5
helloo ooooHH ahaaa cat_helloo cat_ooooHH
<dbl>  <dbl> <dbl>      <int>      <int>
1      1      1   200          1          1
2      2      1   400          1          1
3      3      1   120          1          1
4      4      2   300          2          2
5      5      2   100          2          2
6      6      2   100          2          2

Previous code with funs() from earlier versions of dplyr:

dataframe %>%
mutate_at(vars(contains('oo')), .funs = funs(cat = ntile(., 2))) %>%
rename_at( vars( contains( "_cat") ), funs( paste("cat", gsub("_cat", "", .), sep = "_") ) )