按名称重命名多个列

有人应该已经问过这个问题了,但我找不到答案。就说我已经问过了:

x = data.frame(q=1,w=2,e=3, ...and many many columns...)

what is the most elegant way to rename an arbitrary subset of columns, whose position I don't necessarily know, into some other arbitrary names?

例如,假设我想把 "q""e"重命名为 "A""B",那么最优雅的代码是什么呢?

显然,我可以做一个循环:

oldnames = c("q","e")
newnames = c("A","B")
for(i in 1:2) names(x)[names(x) == oldnames[i]] = newnames[i]

但是我想知道是否有一个更好的方法? 也许使用一些软件包? (plyr::rename等)

265039 次浏览

这将改变这些字母在所有名字中出现的情况:

 names(x) <- gsub("q", "A", gsub("e", "B", names(x) ) )
names(x)[names(x) %in% c("q","e")]<-c("A","B")

data.table包中的 setnames将在 data.framedata.table上工作

library(data.table)
d <- data.frame(a=1:2,b=2:3,d=4:5)
setnames(d, old = c('a','d'), new = c('anew','dnew'))
d




#   anew b dnew
# 1    1 2    4
# 2    2 3    5

Note that changes are made by reference, so no copying (even for data.frames!)

基于@user3114046的回答:

x <- data.frame(q=1,w=2,e=3)
x
#  q w e
#1 1 2 3


names(x)[match(oldnames,names(x))] <- newnames


x
#  A w B
#1 1 2 3

这不依赖于 x数据集中列的特定排序。

对于不太大的数据框架的另一个解决方案是(基于@thelatemail 回复) :

x <- data.frame(q=1,w=2,e=3)


> x
q w e
1 1 2 3


colnames(x) <- c("A","w","B")


> x
A w B
1 1 2 3

此外,你亦可使用:

names(x) <- c("C","w","D")


> x
C w D
1 1 2 3

此外,您还可以重命名列名的一个子集:

names(x)[2:3] <- c("E","F")


> x
C E F
1 1 2 3

所以我最近遇到了这个问题,如果你不确定这些列是否存在,并且只想重命名那些存在的列:

existing <- match(oldNames,names(x))
names(x)[na.omit(existing)] <- newNames[which(!is.na(existing))]

如果数据的一行包含要将所有列更改为的名称,则可以执行此操作

names(data) <- data[row,]

假定 data是您的数据帧,而 row是包含新值的行号。

然后可以删除包含名称的行

data <- data[-row,]

您可以获取名称集,将其保存为列表,然后对字符串进行批量重命名。这方面的一个很好的例子是,当你在一个数据集上进行一个从长到宽的转换时:

names(labWide)
Lab1    Lab10    Lab11    Lab12    Lab13    Lab14    Lab15    Lab16
1 35.75366 22.79493 30.32075 34.25637 30.66477 32.04059 24.46663 22.53063


nameVec <- names(labWide)
nameVec <- gsub("Lab","LabLat",nameVec)


names(labWide) <- nameVec
"LabLat1"  "LabLat10" "LabLat11" "LabLat12" "LabLat13" "LabLat14""LabLat15"    "LabLat16" "

使用 dplyr,你可以做到:

library(dplyr)


df = data.frame(q = 1, w = 2, e = 3)
    

df %>% rename(A = q, B = e)


#  A w B
#1 1 2 3

或者,如果你想使用载体,就像@Jelena-bioinf 建议的那样:

library(dplyr)


df = data.frame(q = 1, w = 2, e = 3)


oldnames = c("q","e")
newnames = c("A","B")


df %>% rename_at(vars(oldnames), ~ newnames)


#  A w B
#1 1 2 3

L · D · 尼古拉斯 · 梅(L · D · 尼古拉斯 · 梅)认为,给定 rename_at的变化正在被 rename_with所取代:

df %>%
rename_with(~ newnames[which(oldnames == .x)], .cols = oldnames)


#  A w B
#1 1 2 3

有很多类似的答案,所以我写了这个函数,你可以复制粘贴。

rename <- function(x, old_names, new_names) {
stopifnot(length(old_names) == length(new_names))
# pull out the names that are actually in x
old_nms <- old_names[old_names %in% names(x)]
new_nms <- new_names[old_names %in% names(x)]


# call out the column names that don't exist
not_nms <- setdiff(old_names, old_nms)
if(length(not_nms) > 0) {
msg <- paste(paste(not_nms, collapse = ", "),
"are not columns in the dataframe, so won't be renamed.")
warning(msg)
}


# rename
names(x)[names(x) %in% old_nms] <- new_nms
x
}


x = data.frame(q = 1, w = 2, e = 3)
rename(x, c("q", "e"), c("Q", "E"))


Q w E
1 1 2 3

下面是我发现的使用 purrr::set_names()和一些 stringr操作的组合来重命名多个列的最有效方法。

library(tidyverse)


# Make a tibble with bad names
data <- tibble(
`Bad NameS 1` = letters[1:10],
`bAd NameS 2` = rnorm(10)
)


data
# A tibble: 10 x 2
`Bad NameS 1` `bAd NameS 2`
<chr>                 <dbl>
1 a                    -0.840
2 b                    -1.56
3 c                    -0.625
4 d                     0.506
5 e                    -1.52
6 f                    -0.212
7 g                    -1.50
8 h                    -1.53
9 i                     0.420
10 j                     0.957


# Use purrr::set_names() with annonymous function of stringr operations
data %>%
set_names(~ str_to_lower(.) %>%
str_replace_all(" ", "_") %>%
str_replace_all("bad", "good"))


# A tibble: 10 x 2
good_names_1 good_names_2
<chr>               <dbl>
1 a                  -0.840
2 b                  -1.56
3 c                  -0.625
4 d                   0.506
5 e                  -1.52
6 f                  -0.212
7 g                  -1.50
8 h                  -1.53
9 i                   0.420
10 j                   0.957

旁注,如果您想将一个字符串连接到所有的列名,您可以只使用这个简单的代码。

colnames(df) <- paste("renamed_",colnames(df),sep="")

如果表包含两个同名的列,那么代码如下所示,

rename(df,newname=oldname.x,newname=oldname.y)

这就是你需要的功能: 然后只要在重命名(X)中传递 x,它就会重命名所有出现的值,如果不在其中,它就不会出错

rename <-function(x){
oldNames = c("a","b","c")
newNames = c("d","e","f")
existing <- match(oldNames,names(x))
names(x)[na.omit(existing)] <- newNames[which(!is.na(existing))]
return(x)
}

您可以使用命名向量。

base R, via subsetting:

x = data.frame(q = 1, w = 2, e = 3)


rename_vec <- c(q = "A", e = "B")
## vector of same length as names(x) which returns NA if there is no match to names(x)
which_rename <- rename_vec[names(x)]
## simple ifelse where names(x) will be renamed for every non-NA
names(x) <- ifelse(is.na(which_rename), names(x), which_rename)


x
#>   A w B
#> 1 1 2 3

或者 !!!dplyr选项:

library(dplyr)


rename_vec <- c(A = "q", B = "e") # the names are just the other way round than in the base R way!


x %>% rename(!!!rename_vec)
#>   A w B
#> 1 1 2 3

后者之所以有效,是因为 “大爆炸”操作符 !!!强制对列表或向量求值。

?`!!`

强制-拼接一个对象列表。列表的元素是 spliced in place, meaning that they each become one single argument.

There are a few answers mentioning the functions dplyr::rename_with and rlang::set_names already. By they are separate. this answer illustrates the differences between the two and the use of functions and formulas to rename columns.

dplyr包中的 rename_with可以使用函数或公式 重命名作为 .cols参数给出的列的选定内容:

library(dplyr)
rename_with(head(iris), toupper, starts_with("Petal"))

等于传递公式 ~ toupper(.x):

rename_with(head(iris), ~ toupper(.x), starts_with("Petal"))

在重命名所有列时,还可以使用 rlang 包中的 set_names。为了制作一个不同的示例,让我们使用 paste0作为重命名函数。pasteO接受2个参数,因此根据我们使用函数还是公式,传递第二个参数的方式不同。

rlang::set_names(head(iris), paste0, "_hi")
rlang::set_names(head(iris), ~ paste0(.x, "_hi"))

通过首先传递数据帧,rename_with也可以实现同样的功能 参数 .data,函数作为第二个参数 .fn,所有列作为第三个参数 argument .cols=everything() and the function parameters as the fourth argument .... Alternatively you can place the second, third and fourth 作为第二个参数给出的公式中的参数。

rename_with(head(iris), paste0, everything(), "_hi")
rename_with(head(iris), ~ paste0(.x, "_hi"))

rename_with只能处理数据帧 也执行向量重命名

rlang::set_names(1:4, c("a", "b", "c", "d"))

更新 dplyr 1.0.0

最新的 dplyr 版本通过添加 rename_with()变得更加灵活,其中 _with 引用一个函数作为输入。恶作剧是将字符向量 newnames重新表达成一个公式(通过 ~) ,因此它将等效于 function(x) return (newnames)

在我的主观意见,这是最优雅的 dplyr 表达式。 更新: 由于@desval,旧名称向量必须由 all_of包装,以包含它的所有元素:

# shortest & most elegant expression
df %>% rename_with(~ newnames, all_of(oldnames))


A w B
1 1 2 3

边注:

如果您将顺序颠倒,那么参数.fn 必须指定为. fn,否则参数:

df %>% rename_with(oldnames, .fn = ~ newnames)


A w B
1 1 2 3

或指定参数:

 df %>% rename_with(.col = oldnames, ~ newnames)


A w B
1 1 2 3

上面提到的许多好的答案都是使用特殊的包。这是一种只使用基础 R 的简单方法。

df.rename.cols <- function(df, col2.list) {
tlist <- transpose(col2.list)
    

names(df)[which(names(df) %in% tlist[[1]])] <- tlist[[2]]


df
}

这里有一个例子:

df1 <- data.frame(A = c(1, 2), B = c(3, 4), C = c(5, 6), D = c(7, 8))
col.list <- list(c("A", "NewA"), c("C", "NewC"))
df.rename.cols(df1, col.list)


NewA B NewC D
1    1 3    5 7
2    2 4    6 8

出于执行时间的考虑,我建议使用数据表结构:

> df = data.table(x = 1:10, y = 3:12, z = 4:13)
> oldnames = c("x","y","z")
> newnames = c("X","Y","Z")
> library(microbenchmark)
> library(data.table)
> library(dplyr)
> microbenchmark(dplyr_1 = df %>% rename_at(vars(oldnames), ~ newnames) ,
+                dplyr_2 = df %>% rename(X=x,Y=y,Z=z) ,
+                data_tabl1= setnames(copy(df), old = c("x","y","z") , new = c("X","Y","Z")),
+                times = 100)
Unit: microseconds
expr    min      lq     mean  median      uq     max neval
dplyr_1 5760.3 6523.00 7092.538 6864.35 7210.45 17935.9   100
dplyr_2 2536.4 2788.40 3078.609 3010.65 3282.05  4689.8   100
data_tabl1  170.0  218.45  368.261  243.85  274.40 12351.7   100

我最近根据 @ 敏捷的豆子的答案(使用 rename_with,以前的 rename_at)构建了一个函数,它可以在数据框架中存在列名时更改它们,这样就可以在适用的情况下使异构数据框架的列名相互匹配。

循环肯定可以改进,但我想我应该分享给子孙后代。

create example data frame:
x= structure(list(observation_date = structure(c(18526L, 18784L,
17601L), class = c("IDate", "Date")), year = c(2020L, 2021L,
2018L)), sf_column = "geometry", agr = structure(c(id = NA_integer_,
common_name = NA_integer_, scientific_name = NA_integer_, observation_count = NA_integer_,
country = NA_integer_, country_code = NA_integer_, state = NA_integer_,
state_code = NA_integer_, county = NA_integer_, county_code = NA_integer_,
observation_date = NA_integer_, time_observations_started = NA_integer_,
observer_id = NA_integer_, sampling_event_identifier = NA_integer_,
protocol_type = NA_integer_, protocol_code = NA_integer_, duration_minutes = NA_integer_,
effort_distance_km = NA_integer_, effort_area_ha = NA_integer_,
number_observers = NA_integer_, all_species_reported = NA_integer_,
group_identifier = NA_integer_, year = NA_integer_, checklist_id = NA_integer_,
yday = NA_integer_), class = "factor", .Label = c("constant",
"aggregate", "identity")), row.names = c("3", "3.1", "3.2"), class = "data.frame")
功能
match_col_names <- function(x){


col_names <- list(date = c("observation_date", "date"),
C =    c("observation_count", "count","routetotal"),
yday  = c("dayofyear"),
latitude  = c("lat"),
longitude = c("lon","long")
)


for(i in seq_along(col_names)){
newname=names(col_names)[i]
oldnames=col_names[[i]]


toreplace = names(x)[which(names(x) %in% oldnames)]
x <- x %>%
rename_with(~newname, toreplace)
}


return(x)


}


运用功能
x <- match_col_names(x)