Changing factor levels with dplyr mutate

This is probably simple and I feel stupid for asking. I want to change the levels of a factor in a data frame, using mutate. Simple example:

library("dplyr")
dat <- data.frame(x = factor("A"), y = 1)
mutate(dat,levels(x) = "B")

I get:

Error: Unexpected '=' in "mutate(dat,levels(x) ="

Why is this not working? How can I change factor levels with mutate?

153102 次浏览

I'm not quite sure I understand your question properly, but if you want to change the factor levels of cyl with mutate() you could do:

df <- mtcars %>% mutate(cyl = factor(cyl, levels = c(4, 6, 8)))

You would get:

#> str(df$cyl)
# Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...

Maybe you are looking for this plyr::revalue function:

mutate(dat, x = revalue(x, c("A" = "B")))

You can see plyr::mapvalues too.

With the forcats package from the tidyverse this is easy, too.

mutate(dat, x = fct_recode(x, "B" = "A"))

You can use the recode function from dplyr.

df <- iris %>%
mutate(Species = recode(Species, setosa = "SETOSA",
versicolor = "VERSICOLOR",
virginica = "VIRGINICA"
)
)

Can't comment because I don't have enough reputation points, but recode only works on a vector, so the above code in @Stefano's answer should be

df <- iris %>%
mutate(Species = recode(Species,
setosa = "SETOSA",
versicolor = "VERSICOLOR",
virginica = "VIRGINICA")
)

From my understanding, the currently accepted answer only changes the order of the factor levels, not the actual labels (i.e., how the levels of the factor are called). To illustrate the difference between levels and labels, consider the following example:

Turn cyl into factor (specifying levels would not be necessary as they are coded in alphanumeric order):

    mtcars2 <- mtcars %>% mutate(cyl = factor(cyl, levels = c(4, 6, 8)))
mtcars2$cyl[1:5]
#[1] 6 6 4 6 8
#Levels: 4 6 8

Change the order of levels (but not the labels itself: cyl is still the same column)

    mtcars3 <- mtcars2 %>% mutate(cyl = factor(cyl, levels = c(8, 6, 4)))
mtcars3$cyl[1:5]
#[1] 6 6 4 6 8
#Levels: 8 6 4
all(mtcars3$cyl==mtcars2$cyl)
#[1] TRUE

Assign new labels to cyl The order of the labels was: c(8, 6, 4), hence we specify new labels as follows:

    mtcars4 <- mtcars3 %>% mutate(cyl = factor(cyl, labels = c("new_value_for_8",
"new_value_for_6",
"new_value_for_4" )))
mtcars4$cyl[1:5]
#[1] new_value_for_6 new_value_for_6 new_value_for_4 new_value_for_6 new_value_for_8
#Levels: new_value_for_8 new_value_for_6 new_value_for_4

Note how this column differs from our first columns:

    all(as.character(mtcars4$cyl)!=mtcars3$cyl)
#[1] TRUE
#Note: TRUE here indicates that all values are unequal because I used != instead of ==
#as.character() was required as the levels were numeric and thus not comparable to a character vector

More details:

If we were to change the levels of cyl using mtcars2 instead of mtcars3, we would need to specify the labels differently to get the same result. The order of labels for mtcars2 was: c(4, 6, 8), hence we specify new labels as follows

    #change labels of mtcars2 (order used to be: c(4, 6, 8)
mtcars5 <- mtcars2 %>% mutate(cyl = factor(cyl, labels = c("new_value_for_4",
"new_value_for_6",
"new_value_for_8" )))

Unlike mtcars3$cyl and mtcars4$cyl, the labels of mtcars4$cyl and mtcars5$cyl are thus identical, even though their levels have a different order.

    mtcars4$cyl[1:5]
#[1] new_value_for_6 new_value_for_6 new_value_for_4 new_value_for_6 new_value_for_8
#Levels: new_value_for_8 new_value_for_6 new_value_for_4


mtcars5$cyl[1:5]
#[1] new_value_for_6 new_value_for_6 new_value_for_4 new_value_for_6 new_value_for_8
#Levels: new_value_for_4 new_value_for_6 new_value_for_8


all(mtcars4$cyl==mtcars5$cyl)
#[1] TRUE


levels(mtcars4$cyl) == levels(mtcars5$cyl)
#1] FALSE  TRUE FALSE