将数据框中所有字符变量中的所有值从小写转换为大写

我有一个字符和数字变量的 混合数据框架

city,hs_cd,sl_no,col_01,col_02,col_03
Austin,1,2,,46,Female
Austin,1,3,,32,Male
Austin,1,4,,27,Male
Austin,1,5,,20,Female
Austin,2,2,,42,Female
Austin,2,1,,52,Male
Austin,2,3,,25,Male
Austin,2,4,,22,Female
Austin,3,3,,30,Female
Austin,3,1,,65,Female

我想把数据框中的所有小写字符都转换成大写字母。有没有什么方法可以在一个镜头中做到这一点,而不用在每个字符变量上重复这样做呢?

247996 次浏览

Starting with the following sample data :

df <- data.frame(v1=letters[1:5],v2=1:5,v3=letters[10:14],stringsAsFactors=FALSE)


v1 v2 v3
1  a  1  j
2  b  2  k
3  c  3  l
4  d  4  m
5  e  5  n

You can use :

data.frame(lapply(df, function(v) {
if (is.character(v)) return(toupper(v))
else return(v)
}))

Which gives :

  v1 v2 v3
1  A  1  J
2  B  2  K
3  C  3  L
4  D  4  M
5  E  5  N

From the dplyr package you can also use the mutate_all() function in combination with toupper(). This will affect both character and factor classes.

library(dplyr)
df <- mutate_all(df, funs=toupper)

A side comment here for those using any of these answers. Juba's answer is great, as it's very selective if your variables are either numeric or character strings. If however, you have a combination (e.g. a1, b1, a2, b2) etc. It will not convert the characters properly.

As @Trenton Hoffman notes,

library(dplyr)
df <- mutate_each(df, funs(toupper))

affects both character and factor classes and works for "mixed variables"; e.g. if your variable contains both a character and a numberic value (e.g. a1) both will be converted to a factor. Overall this isn't too much of a concern, but if you end up wanting match data.frames for example

df3 <- df1[df1$v1 %in% df2$v1,]

where df1 has been has been converted and df2 contains a non-converted data.frame or similar, this may cause some problems. The work around is that you briefly have to run

df2 <- df2 %>% mutate_each(funs(toupper), v1)
#or
df2 <- df2 %>% mutate_each(df2, funs(toupper))
#and then
df3 <- df1[df1$v1 %in% df2$v1,]

If you work with genomic data, this is when knowing this can come in handy.

If you need to deal with data.frames that include factors you can use:

df = data.frame(v1=letters[1:5],v2=1:5,v3=letters[10:14],v4=as.factor(letters[1:5]),v5=runif(5),stringsAsFactors=FALSE)


df
v1 v2 v3 v4        v5
1  a  1  j  a 0.1774909
2  b  2  k  b 0.4405019
3  c  3  l  c 0.7042878
4  d  4  m  d 0.8829965
5  e  5  n  e 0.9702505




sapply(df,class)
v1          v2          v3          v4          v5
"character"   "integer" "character"    "factor"   "numeric"

Use mutate_each_ to convert factors to character then convert all to uppercase

   upper_it = function(X){X %>% mutate_each_( funs(as.character(.)), names( .[sapply(., is.factor)] )) %>%
mutate_each_( funs(toupper), names( .[sapply(., is.character)] ))}   # convert factor to character then uppercase

Gives

  upper_it(df)
v1 v2 v3 v4
1  A  1  J  A
2  B  2  K  B
3  C  3  L  C
4  D  4  M  D
5  E  5  N  E

While

sapply( upper_it(df),class)
v1          v2          v3          v4          v5
"character"   "integer" "character" "character"   "numeric"

It simple with apply function in R

f <- apply(f,2,toupper)

No need to check if the column is character or any other type.

Another alternative is to use a combination of mutate_if() and str_to_upper() function, both from the tidyverse package:

df %>% mutate_if(is.character, str_to_upper) -> df

This will convert all string variables in the data frame to upper case. str_to_lower() do the opposite.

Alternatively, if you just want to convert one particular row to uppercase, use the code below:

df[[1]] <- toupper(df[[1]])

dplyr >= 1.0.0

Scoped verbs that end in _if, _at, _all have been superseded by the use of across() in packageVersion("dplyr") 1.0.0 or newer. To do this using across:

df %>%
mutate(across(where(is.character), toupper))
  • The first argument to across is which columns to transform using tidyselect syntax. The above will apply the function across all columns that are character.
  • The second argument to across is the function to apply. This also supports lambda-style syntax: ~ toupper(.x) that make setting additional function arguments easy and clear.

Data

df <- structure(list(city = c("Austin", "Austin", "Austin", "Austin",
"Austin", "Austin", "Austin", "Austin", "Austin", "Austin"),
hs_cd = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), sl_no = c(2L,
3L, 4L, 5L, 2L, 1L, 3L, 4L, 3L, 1L), col_01 = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA), col_02 = c(46L, 32L, 27L, 20L,
42L, 52L, 25L, 22L, 30L, 65L), col_03 = c("Female", "Male",
"Male", "Female", "Female", "Male", "Male", "Female", "Female",
"Female")), class = "data.frame", row.names = c(NA, -10L))