双层管中 NA 的去除

我试图用 dplyr 管道从子集中移除 NA。我的回答是否意味着我错过了一步。我正在学习如何使用 dplyr 编写函数:

> outcome.df%>%
+ group_by(Hospital,State)%>%
+ arrange(desc(HeartAttackDeath,na.rm=TRUE))%>%
+ head()
Source: local data frame [6 x 5]
Groups: Hospital, State
Hospital State HeartAttackDeath
1     ABBEVILLE AREA MEDICAL CENTER    SC               NA
2        ABBEVILLE GENERAL HOSPITAL    LA               NA
3      ABBOTT NORTHWESTERN HOSPITAL    MN             12.3
4   ABILENE REGIONAL MEDICAL CENTER    TX             17.2
5        ABINGTON MEMORIAL HOSPITAL    PA             14.3
6 ABRAHAM LINCOLN MEMORIAL HOSPITAL    IL               NA
Variables not shown: HeartFailureDeath (dbl), PneumoniaDeath
(dbl)
191765 次浏览

I don't think desc takes an na.rm argument... I'm actually surprised it doesn't throw an error when you give it one. If you just want to remove NAs, use na.omit (base) or tidyr::drop_na:

outcome.df %>%
na.omit() %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()


library(tidyr)
outcome.df %>%
drop_na() %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()

If you only want to remove NAs from the HeartAttackDeath column, filter with is.na, or use tidyr::drop_na:

outcome.df %>%
filter(!is.na(HeartAttackDeath)) %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()


outcome.df %>%
drop_na(HeartAttackDeath) %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()

As pointed out at the dupe, complete.cases can also be used, but it's a bit trickier to put in a chain because it takes a data frame as an argument but returns an index vector. So you could use it like this:

outcome.df %>%
filter(complete.cases(.)) %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()