删除一个特定列中具有空白值的行

我正在处理一个大型数据集,其中一些行有 NA,另一些行有空格:

df <- data.frame(ID = c(1:7),
home_pc = c("","CB4 2DT", "NE5 7TH", "BY5 8IB", "DH4 6PB","MP9 7GH","KN4 5GH"),
start_pc = c(NA,"Home", "FC5 7YH","Home", "CB3 5TH", "BV6 5PB",NA),
end_pc = c(NA,"CB5 4FG","Home","","Home","",NA))

如何一次性删除 NAs 和空格(在 start _ pc 和 end _ pc 列中) ?我过去用过:

df<- df[-which(is.na(df$start_pc)), ]

... 删除 NAs-是否有类似的命令删除空格?

202101 次浏览
 df[!(is.na(df$start_pc) | df$start_pc==""), ]

It is the same construct - simply test for empty strings rather than NA:

Try this:

df <- df[-which(df$start_pc == ""), ]

In fact, looking at your code, you don't need the which, but use the negation instead, so you can simplify it to:

df <- df[!(df$start_pc == ""), ]
df <- df[!is.na(df$start_pc), ]

And, of course, you can combine these two statements as follows:

df <- df[!(df$start_pc == "" | is.na(df$start_pc)), ]

And simplify it even further with with:

df <- with(df, df[!(start_pc == "" | is.na(start_pc)), ])

You can also test for non-zero string length using nzchar.

df <- with(df, df[!(nzchar(start_pc) | is.na(start_pc)), ])

Disclaimer: I didn't test any of this code. Please let me know if there are syntax errors anywhere

An easy approach would be making all the blank cells NA and only keeping complete cases. You might also look for na.omit examples. It is a widely discussed topic.

df[df==""]<-NA
df<-df[complete.cases(df),]

Alternative solution can be to remove the rows with blanks in one variable:

df <- subset(df, VAR != "")

An elegant solution with dplyr would be:

df %>%
# recode empty strings "" by NAs
na_if("") %>%
# remove NAs
na.omit