“后删除字符串的一部分”

我正在研究国家加州调查局的参考基因组登记号码，比如变量 a:

a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")

为了获得信息从生物艺术包，我需要删除后的 .1，.2等加入号码。我通常这样做的代码:

b <- sub("..*", "", a)


# [1] "" "" "" "" "" ""

但是正如你所看到的，这不是这个变量的正确方法，有人能帮我解决这个问题吗？

184789 次浏览

小开

You could do:

sub("*\\.[0-9]", "", a)

library(stringr)
str_sub(a, start=1, end=-3)

小开

最佳答案

You just need to escape the period:

a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")


gsub("\\..*","",a)
[1] "NM_020506"    "NM_020519"    "NM_001030297" "NM_010281"    "NM_011419"    "NM_053155"

小开

We can pretend they are filenames and remove extensions:

tools::file_path_sans_ext(a)
# [1] "NM_020506"    "NM_020519"    "NM_001030297" "NM_010281"    "NM_011419"    "NM_053155"

小开

If the string should be of fixed length, then substr from base R can be used. But, we can get the position of the . with regexpr and use that in substr

substr(a, 1, regexpr("\\.", a)-1)
#[1] "NM_020506"    "NM_020519"    "NM_001030297" "NM_010281"    "NM_011419"    "NM_053155"

小开

We can use a lookahead regex to extract the strings before ..

library(stringr)


str_extract(a, ".*(?=\\.)")
[1] "NM_020506"    "NM_020519"    "NM_001030297" "NM_010281"
[5] "NM_011419"    "NM_053155"

小开

Another option is to use str_split from stringr:

library(stringr)
str_split(a, "\\.", simplify=T)[,1]

[1] "NM_020506"    "NM_020519"    "NM_001030297" "NM_010281"    "NM_011419"    "NM_053155"