R 中字符串到日期转换的“标准明确日期”格式是什么?

请考虑以下几点

$ R --vanilla


> as.Date("01 Jan 2000")
Error in charToDate(x) :
character string is not in a standard unambiguous format

但是这个日期显然是 的标准格式,为什么会有错误消息?

更糟糕的是,一个模棱两可的日期显然在没有警告或错误的情况下被接受,然后读取错误!

> as.Date("01/01/2000")
[1] "0001-01-20"

我在[ R ]标签中搜索并发现了包含此错误消息的其他28个问题。所有的解决方案和变通方法都涉及指定格式,iuc。这个问题的不同之处在于,我要问的是,无论如何,标准的明确格式定义在哪里,它们是否可以改变?每个人都收到这些信息了吗,还是只有我收到了?也许和现场有关?

换句话说,有没有比需要指定格式更好的解决方案?

含有“[ R ]标准明确格式”的29个问题

> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)


locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252


attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
244056 次浏览

This is documented behavior. From ?as.Date:

format: A character string. If not specified, it will try '"%Y-%m-%d"' then '"%Y/%m/%d"' on the first non-'NA' element, and give an error if neither works.

as.Date("01 Jan 2000") yields an error because the format isn't one of the two listed above. as.Date("01/01/2000") yields an incorrect answer because the date isn't in one of the two formats listed above.

I take "standard unambiguous" to mean "ISO-8601" (even though as.Date isn't that strict, as "%m/%d/%Y" isn't ISO-8601).

If you receive this error, the solution is to specify the format your date (or datetimes) are in, using the formats described in the Details section in ?strptime.

Make sure that the order of the conversion specification as well as any separators correspond exactly with the format of your input string. Also, be sure to use particular care if your data contain day/month names and/or abbreviations, as the conversion will depend on your locale (see the examples in ?strptime and read ?LC_TIME; see also ABC2, ABC3 and ABC4 return unexpected NA).

As a complement to @JoshuaUlrich answer, here is the definition of function as.Date.character:

as.Date.character
function (x, format = "", ...)
{
charToDate <- function(x) {
xx <- x[1L]
if (is.na(xx)) {
j <- 1L
while (is.na(xx) && (j <- j + 1L) <= length(x)) xx <- x[j]
if (is.na(xx))
f <- "%Y-%m-%d"
}
if (is.na(xx) || !is.na(strptime(xx, f <- "%Y-%m-%d",
tz = "GMT")) || !is.na(strptime(xx, f <- "%Y/%m/%d",
tz = "GMT")))
return(strptime(x, f))
stop("character string is not in a standard unambiguous format")
}
res <- if (missing(format))
charToDate(x)
else strptime(x, format, tz = "GMT")
as.Date(res)
}
<bytecode: 0x265b0ec>
<environment: namespace:base>

So basically if both strptime(x, format="%Y-%m-%d") and strptime(x, format="%Y/%m/%d") throws an NA it is considered ambiguous and if not unambiguous.

Converting the date without specifying the current format can bring this error to you easily.

Here is an example:

sdate <- "2015.10.10"

Convert without specifying the Format:

date <- as.Date(sdate4) # ==> This will generate the same error"""Error in charToDate(x): character string is not in a standard unambiguous format""".

Convert with specified Format:

date <- as.Date(sdate4, format = "%Y.%m.%d") # ==> Error Free Date Conversion.

In other words, is there a better solution than needing to specify the format?

Yes, there is now (ie in late 2016), thanks to anytime::anydate from the anytime package.

See the following for some examples from above:

R> anydate(c("01 Jan 2000", "01/01/2000", "2015/10/10"))
[1] "2000-01-01" "2000-01-01" "2015-10-10"
R>

As you said, these are in fact unambiguous and should just work. And via anydate() they do. Without a format.

This works perfectly for me, not matter how the date was coded previously.

library(lubridate)
data$created_date1 <- mdy_hm(data$created_at)
data$created_date1 <- as.Date(data$created_date1)

If the date is for example: "01 Jan 2000", I recommend using

library(lubridate)
date_corrected<-dmy("01 Jan 2000")
date_corrected
[1] "2000-01-01"
class(date_corrected)
[1] "Date"

lubridate has a function for almost every type of date.

As a complement: This error can be raised as well if an entry you are trying to cast is a string that should have been NA. If you specify the expected format -or use "real" NAs- there are no problems:

Minimum reproducible example with data.table:

library(data.table)
df <- data.table(date_good = c("01-01-2001", "01-01-2001"), date_bad= ("NA", "01-01-2001"))


df[, .(date_good = as.Date(date_good), date_bad = as.Date(date_bad))]
# Error in charToDate(x) : character string is not in a standard unambiguous format


df[, .(date_good = as.Date(date_good), date_bad = as.Date(date_bad, format="%Y-%m-%d"))]
# No errors; you simply get NA.


df2 <- data.table(date_good = c("01-01-2001", "01-01-2001"), date_bad= (NA, "01-01-2001"))
    

df2[, .(date_good = as.Date(date_good), date_bad = as.Date(date_bad))]
# Just NA