要素水平和要素标签之间的混淆

小开

最佳答案

很短: 水平是输入，标签是输出在 factor()函数。因子只有一个 level属性，该属性由 factor()函数中的 labels参数设置。这与 SPSS 等统计软件包中的标签概念不同，一开始可能会让人感到困惑。

你在这行代码中做什么

df$f <- factor(df$f, levels=c('a','b','c'),
labels=c('Treatment A: XYZ','Treatment B: YZX','Treatment C: ZYX'))

is telling to R that there is a vector df$f

你想把它转化成一个因素,
其中不同的水平被编码为 a，b 和 c
你希望这些水平被标记为治疗 A 等等。

Factor 函数将查找值 a、 b 和 c，将它们转换为数值 factor 类，并将标签值添加到 factor 的 level属性中。此属性用于将内部数值转换为正确的标签。但是如您所见，没有 label属性。

> df <- data.frame(v=c(1,2,3),f=c('a','b','c'))
> attributes(df$f)
$levels
[1] "a" "b" "c"


$class
[1] "factor"


> df$f <- factor(df$f, levels=c('a','b','c'),
+   labels=c('Treatment A: XYZ','Treatment B: YZX','Treatment C: ZYX'))
> attributes(df$f)
$levels
[1] "Treatment A: XYZ" "Treatment B: YZX" "Treatment C: ZYX"


$class
[1] "factor"

小开

我编写了一个包“ lfactor”，它允许您引用级别或标签。

# packages
install.packages("lfactors")
require(lfactors)


flips <- lfactor(c(0,1,1,0,0,1), levels=0:1, labels=c("Tails", "Heads"))
# Tails can now be referred to as, "Tails" or 0
# These two lines return the same result
flips == "Tails"
#[1]  TRUE FALSE FALSE  TRUE  TRUE FALSE
flips == 0
#[1]  TRUE FALSE FALSE  TRUE  TRUE FALSE

Note that an lfactor requires that the levels be numeric so that they cannot be confused with the labels.

小开

只是想分享一个我通常用来处理这个问题的技术，为脚本和漂亮的打印使用不同的因子变量的级别的名称:

# Load packages
library(tidyverse)
library(sjlabelled)
library(patchwork)


# Create data frames
df <- data.frame(v = c(1, 2, 3), f = c("a", "b", "c"))
df_labelled <- data.frame(v = c(1, 2, 3), f = c("a", "b", "c")) %>%
val_labels(
# levels are characters
f = c(
"a" = "Treatment A: XYZ", "b" = "Treatment B: YZX",
"c" = "Treatment C: ZYX"
),
# levels are numeric
v = c("1" = "Exp. Unit 1", "2" = "Exp. Unit 2", "3" = "Exp. Unit 3")
)


# df and df_labelled appear exactly the same when printed and nothing changes
# in terms of scripting
df
#>   v f
#> 1 1 a
#> 2 2 b
#> 3 3 c
df_labelled
#>   v f
#> 1 1 a
#> 2 2 b
#> 3 3 c


# Now, let's take a look at the structure of df and df_labelled
str(df)
#> 'data.frame':    3 obs. of  2 variables:
#>  $ v: num  1 2 3
#>  $ f: chr  "a" "b" "c"
str(df_labelled) # notice the attributes
#> 'data.frame':    3 obs. of  2 variables:
#>  $ v: num  1 2 3
#>   ..- attr(*, "labels")= Named num [1:3] 1 2 3
#>   .. ..- attr(*, "names")= chr [1:3] "Exp. Unit 1" "Exp. Unit 2" "Exp. Unit 3"
#>  $ f: chr  "a" "b" "c"
#>   ..- attr(*, "labels")= Named chr [1:3] "a" "b" "c"
#>   .. ..- attr(*, "names")= chr [1:3] "Treatment A: XYZ" "Treatment B: YZX" "Treatment C: ZYX"


# Lastly, create ggplots with and without pretty names for factor levels
p1 <- df_labelled %>% # or, df
ggplot(aes(x = f, y = v)) +
geom_point() +
labs(x = "Treatment", y = "Measurement")
p2 <- df_labelled %>%
ggplot(aes(x = to_label(f), y = to_label(v))) +
geom_point() +
labs(x = "Treatment", y = "Experimental Unit")


p1 / p2

^{由 Reprex 软件包(v2.0.0)在2021-08-17创建}