替换数据框中字符串的所有匹配项

我正在研究一个数据帧,它没有检测功能,代码是“ <”。有时候在“ <”后面有一个空格,有时候没有例如“ < 2”或者“ < 2”。我想移除这个地方的所有痕迹。

例如:

data <- data.frame(name = rep(letters[1:3], each = 3), var1 = rep('< 2', 9), var2 = rep('<3', 9))


name var1 var2
1    a  < 2   <3
2    b  < 2   <3
3    c  < 2   <3

这就是我要做的:

我可以提取所有的值并创建新的字符串,但是我不能把它们放回数据框中。

index <- str_detect(unlist(data), '<')
index <- matrix(index, nrow = 3)


data[index]
#[1] "< 2" "< 2" "< 2" "<3"  "<3"  "<3"


replacements <- str_replace_all(data[index], "<[ ]+","<")
replacements
#[1] "<2" "<2" "<2" "<3" "<3" "<3"


data[index] <- replacements


#Error in `[<-.data.frame`(`*tmp*`, index, value = c("<2", "<2", "<2",  :
#  unsupported matrix index in replacement
195330 次浏览

If you are only looking to replace all occurrences of "< " (with space) with "<" (no space), then you can do an lapply over the data frame, with a gsub for replacement:

> data <- data.frame(lapply(data, function(x) {
+                  gsub("< ", "<", x)
+              }))
> data
name var1 var2
1    a   <2   <3
2    a   <2   <3
3    a   <2   <3
4    b   <2   <3
5    b   <2   <3
6    b   <2   <3
7    c   <2   <3
8    c   <2   <3
9    c   <2   <3

To remove all spaces in every column, you can use

data[] <- lapply(data, gsub, pattern = " ", replacement = "", fixed = TRUE)

or to constrict this to just the second and third columns (i.e. every column except the first),

data[-1] <- lapply(data[-1], gsub, pattern = " ", replacement = "", fixed = TRUE)

Here is a dplyr solution

library(dplyr)
library(stringr)


Censor_consistently <-  function(x){
str_replace(x, '^\\s*([<>])\\s*(\\d+)', '\\1\\2')
}




test_df <- tibble(x = c('0.001', '<0.002', ' < 0.003', ' >  100'),  y = 4:1)


mutate_all(test_df, funs(Censor_consistently))


# A tibble: 4 × 2
x     y
<chr> <chr>
1  0.001     4
2 <0.002     3
3 <0.003     2
4   >100     1

I had the problem, I had to replace "Not Available" with NA and my solution goes like this

data <- sapply(data,function(x) {x <- gsub("Not Available",NA,x)})

Equivalent to "find and replace." Don't overthink it.

Try it with one:

library(tidyverse)
df <- data.frame(name = rep(letters[1:3], each = 3), var1 = rep('< 2', 9), var2 = rep('<3', 9))


df %>%
mutate(var1 = str_replace(var1, " ", ""))
#>   name var1 var2
#> 1    a   <2   <3
#> 2    a   <2   <3
#> 3    a   <2   <3
#> 4    b   <2   <3
#> 5    b   <2   <3
#> 6    b   <2   <3
#> 7    c   <2   <3
#> 8    c   <2   <3
#> 9    c   <2   <3

Apply to all

df %>%
mutate_all(funs(str_replace(., " ", "")))
#>   name var1 var2
#> 1    a   <2   <3
#> 2    a   <2   <3
#> 3    a   <2   <3
#> 4    b   <2   <3
#> 5    b   <2   <3
#> 6    b   <2   <3
#> 7    c   <2   <3
#> 8    c   <2   <3
#> 9    c   <2   <3

If the extra space was produced by uniting columns, think about making str_trim part of your workflow.

Created on 2018-03-11 by the reprex package (v0.2.0).

late to the party. but if you only want to get rid of leading/trailing white space, R base has a function trimws

For example:

data <- apply(X = data, MARGIN = 2, FUN = trimws) %>% as.data.frame()

As an update to the answer by @Nettle, mutate_all() has been superseded by mutate( across( ... ) ):

library(tidyverse)


df <- data.frame(
name = rep( letters[1:3], each = 3 ),
var1 = rep( '< 2', 9 ),
var2 = rep( '<3', 9 )
)


df %>%
mutate( across(
.cols = everything(),
~str_replace( ., " ", "" )
) )


#>   name var1 var2
#> 1    a   <2   <3
#> 2    a   <2   <3
#> 3    a   <2   <3
#> 4    b   <2   <3
#> 5    b   <2   <3
#> 6    b   <2   <3
#> 7    c   <2   <3
#> 8    c   <2   <3
#> 9    c   <2   <3