将多个空格合并为单个空格; 删除尾随/前导空格

我希望将多个空格合并到单个空格中(空格也可以是制表符) ,并删除尾随/前导空格。

比如说..。

string <- "Hi        buddy        what's up    Bro"

"Hi buddy what's up bro"

我检查了 用单个空格替换多个空格的正则表达式给出的溶液。请注意,不要把 t 或 n 作为确切的空间内的玩具字符串和饲料,作为模式在 gsub。我要 R 调。

请注意,我不能把多个空间的玩具字符串。 谢谢

71287 次浏览

This seems to meet your needs.

string <- "  Hi buddy   what's up   Bro "
library(stringr)
str_replace(gsub("\\s+", " ", str_trim(string)), "B", "b")
# [1] "Hi buddy what's up bro"

You could also try clean from qdap

library(qdap)
library(stringr)
str_trim(clean(string))
#[1] "Hi buddy what's up Bro"

Or as suggested by @Tyler Rinker (using only qdap)

Trim(clean(string))
#[1] "Hi buddy what's up Bro"

Another approach using a single regex:

gsub("(?<=[\\s])\\s*|^\\s+|\\s+$", "", string, perl=TRUE)

Explanation (from)

NODE                     EXPLANATION
--------------------------------------------------------------------------------
(?<=                     look behind to see if there is:
--------------------------------------------------------------------------------
[\s]                     any character of: whitespace (\n, \r,
\t, \f, and " ")
--------------------------------------------------------------------------------
)                        end of look-behind
--------------------------------------------------------------------------------
\s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
|                        OR
--------------------------------------------------------------------------------
^                        the beginning of the string
--------------------------------------------------------------------------------
\s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
$                        before an optional \n, and the end of the
string

The qdapRegex has the rm_white function to handle this:

library(qdapRegex)
rm_white(string)


## [1] "Hi buddy what's up Bro"

Or simply try the squish function from stringr

library(stringr)
string <- "  Hi buddy   what's up   Bro "
str_squish(string)
# [1] "Hi buddy what's up Bro"

You do not need to import external libraries to perform such a task:

string <- " Hi        buddy        what's up    Bro "
string <- gsub("\\s+", " ", string)
string <- trimws(string)
string
[1] "Hi buddy what's up Bro"

Or, in one line:

string <- trimws(gsub("\\s+", " ", string))

Much cleaner.

For this purpose no need to load any extra libraries as the gsub() of Base r package does the work.
No need to remember those extra libraries. Remove leading and trailing white spaces with trimws() and replace the extra white spaces using gsub() as mentioned by @Adam Erickson.

    `string = " Hi        buddy        what's up    Bro "
trimws(gsub("\\s+", " ", string))`

Here \\s+ matches one or more white spaces and gsub replaces it with single space.

To know what any regular expression is doing, do visit this link as mentioned by @Tyler Rinker.
Just copy and paste the regular expression you want to know what it is doing and this will do the rest.

Another solution using strsplit:

Splitting text into words, and, then, concatenating single words using paste function.

string <- "Hi        buddy        what's up    Bro"
stringsplit <- sapply(strsplit(string, " "), function(x){x[!x ==""]})
paste(stringsplit ,collapse = " ")

For more than one document:

string <- c("Hi        buddy        what's up    Bro"," an  example using       strsplit ")
stringsplit <- lapply(strsplit(string, " "), function(x){x[!x ==""]})
sapply(stringsplit ,function(d) paste(d,collapse = " "))

enter image description here

This seems to work.
It doesn't eliminate whitespaces at the beginning or the end of the sentence as Rich Scriven's answer but, it merge multiple whitespices

library("stringr")
string <- "Hi     buddy     what's      up       Bro"
str_replace_all(string, "\\s+", " ")
#> str_replace_all(string, "\\s+", " ")
#  "Hi buddy what's up Bro"