如何替换选定列的表中的 NA 值

有很多关于更换 NA 值的帖子。我知道可以用以下表格/框架取代新来港定居人士:

x[is.na(x)]<-0

但是,如果我想把它限制在某些列上,那该怎么办呢? 让我给你们看一个例子。

首先,让我们从一个数据集开始。

set.seed(1234)
x <- data.frame(a=sample(c(1,2,NA), 10, replace=T),
b=sample(c(1,2,NA), 10, replace=T),
c=sample(c(1:5,NA), 10, replace=T))

结果是:

    a  b  c
1   1 NA  2
2   2  2  2
3   2  1  1
4   2 NA  1
5  NA  1  2
6   2 NA  5
7   1  1  4
8   1  1 NA
9   2  1  5
10  2  1  1

好的,所以我只想把替换限制在‘ a’和‘ b’列。我的尝试是:

x[is.na(x), 1:2]<-0

以及:

x[is.na(x[1:2])]<-0

但是没用。

我的 data.table 尝试,也就是 y<-data.table(x),显然是行不通的:

y[is.na(y[,list(a,b)]), ]

我想在 is.na 参数中传递列,但显然不行。

我想在一个 data.frame 和一个 data.table 中完成这个操作。我的最终目标是在‘ a’和‘ b’中将1:2重新编码为0:1,同时保持‘ c’的原样,因为它不是逻辑变量。我有很多专栏,所以我不想一个一个来。我只是想知道该怎么做。

你有什么建议吗?

175276 次浏览

You can do:

x[, 1:2][is.na(x[, 1:2])] <- 0

or better (IMHO), use the variable names:

x[c("a", "b")][is.na(x[c("a", "b")])] <- 0

In both cases, 1:2 or c("a", "b") can be replaced by a pre-defined vector.

Edit 2020-06-15

Since data.table 1.12.4 (Oct 2019), data.table gains two functions to facilitate this: nafill and setnafill.

nafill operates on columns:

cols = c('a', 'b')
y[ , (cols) := lapply(.SD, nafill, fill=0), .SDcols = cols]

setnafill operates on tables (the replacements happen by-reference/in-place)

setnafill(y, cols=cols, fill=0)
# print y to show the effect
y[]

This will also be more efficient than the other options; see ?nafill for more, the last-observation-carried-forward (LOCF) and next-observation-carried-backward (NOCB) versions of NA imputation for time series.


This will work for your data.table version:

for (col in c("a", "b")) y[is.na(get(col)), (col) := 0]

Alternatively, as David Arenburg points out below, you can use set (side benefit - you can use it either on data.frame or data.table):

for (col in 1:2) set(x, which(is.na(x[[col]])), col, 0)

Not sure if this is more concise, but this function will also find and allow replacement of NAs (or any value you like) in selected columns of a data.table:

update.mat <- function(dt, cols, criteria) {
require(data.table)
x <- as.data.frame(which(criteria==TRUE, arr.ind = TRUE))
y <- as.matrix(subset(x, x$col %in% which((names(dt) %in% cols), arr.ind = TRUE)))
y
}

To apply it:

y[update.mat(y, c("a", "b"), is.na(y))] <- 0

The function creates a matrix of the selected columns and rows (cell coordinates) that meet the input criteria (in this case is.na == TRUE).

this works fine for me

DataTable DT = new DataTable();


DT = DT.AsEnumerable().Select(R =>
{
R["Campo1"] = valor;
return (R);
}).ToArray().CopyToDataTable();

For a specific column, there is an alternative with sapply

DF <- data.frame(A = letters[1:5],
B = letters[6:10],
C = c(2, 5, NA, 8, NA))


DF_NEW <- sapply(seq(1, nrow(DF)),
function(i) ifelse(is.na(DF[i,3]) ==
TRUE,
0,
DF[i,3]))


DF[,3] <- DF_NEW
DF

This is now trivial in tidyr with replace_na(). The function appears to work for data.tables as well as data.frames:

tidyr::replace_na(x, list(a=0, b=0))

Building on @Robert McDonald's tidyr::replace_na() answer, here are some dplyr options for controlling which columns the NAs are replaced:

library(tidyverse)


# by column type:
x %>%
mutate_if(is.numeric, ~replace_na(., 0))


# select columns defined in vars(col1, col2, ...):
x %>%
mutate_at(vars(a, b, c), ~replace_na(., 0))


# all columns:
x %>%
mutate_all(~replace_na(., 0))

We can solve it in data.table way with tidyr::repalce_na function and lapply

library(data.table)
library(tidyr)
setDT(df)
df[,c("a","b","c"):=lapply(.SD,function(x) replace_na(x,0)),.SDcols=c("a","b","c")]

In this way, we can also solve paste columns with NA string. First, we replace_na(x,""),then we can use stringr::str_c to combine columns!

it's quite handy with data.table and stringr

library(data.table)
library(stringr)


x[, lapply(.SD, function(xx) {str_replace_na(xx, 0)})]

FYI

Starting from the data.table y, you can just write:
y[, (cols):=lapply(.SD, function(i){i[is.na(i)] <- 0; i}), .SDcols = cols]
Don't forget to library(data.table) before creating y and running this command.

For completeness, built upon @sbha's answer, here is the tidyverse version with the across() function that's available in dplyr since version 1.0 (which supersedes the *_at() variants, and others):

# random data
set.seed(1234)
x <- data.frame(a = sample(c(1, 2, NA), 10, replace = T),
b = sample(c(1, 2, NA), 10, replace = T),
c = sample(c(1:5, NA), 10, replace = T))
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
library(tidyr)
# with the magrittr pipe
x %>% mutate(across(1:2, ~ replace_na(.x, 0)))
#>    a b  c
#> 1  2 2  5
#> 2  2 2  2
#> 3  1 0  5
#> 4  0 2  2
#> 5  1 2 NA
#> 6  1 2  3
#> 7  2 2  4
#> 8  2 1  4
#> 9  0 0  3
#> 10 2 0  1
# with the native pipe (since R 4.1)
x |> mutate(across(1:2, ~ replace_na(.x, 0)))
#>    a b  c
#> 1  2 2  5
#> 2  2 2  2
#> 3  1 0  5
#> 4  0 2  2
#> 5  1 2 NA
#> 6  1 2  3
#> 7  2 2  4
#> 8  2 1  4
#> 9  0 0  3
#> 10 2 0  1

Created on 2021-12-08 by the reprex package (v2.0.1)

This needed a bit extra for dealing with NA's in factors.

Found a useful function here, which you can then use with mutate_at or mutate_if:

replace_factor_na <- function(x){
x <- as.character(x)
x <- if_else(is.na(x), 'NONE', x)
x <- as.factor(x)
}


df <- df %>%
mutate_at(
vars(vector_of_column_names),
replace_factor_na
)

Or apply to all factor columns:

df <- df %>%
mutate_if(is.factor, replace_factor_na)