As.POSIXct/as.POSIXlt 和 strptime 在将字符向量转换为 POSIXct/POSIXlt 方面的差异

我在这里跟踪了一些问题,这些问题涉及如何将字符向量转换为日期时间类。我经常看到两个方法,strptime 和 as.POSIXct/as.POSIXlt 方法。我看了这两个函数,但不清楚它们之间的区别。

脱衣舞时间

function (x, format, tz = "")
{
y <- .Internal(strptime(as.character(x), format, tz))
names(y$year) <- names(x)
y
}
<bytecode: 0x045fcea8>
<environment: namespace:base>

作为 POSIXct

function (x, tz = "", ...)
UseMethod("as.POSIXct")
<bytecode: 0x069efeb8>
<environment: namespace:base>

作为 POSIXlt

function (x, tz = "", ...)
UseMethod("as.POSIXlt")
<bytecode: 0x03ac029c>
<environment: namespace:base>

做一个微基准测试,看看是否存在性能差异:

library(microbenchmark)
Dates <- sample(c(dates = format(seq(ISOdate(2010,1,1), by='day', length=365), format='%d-%m-%Y')), 5000, replace = TRUE)
df <- microbenchmark(strptime(Dates, "%d-%m-%Y"), as.POSIXlt(Dates, format = "%d-%m-%Y"), times = 1000)


Unit: milliseconds
expr      min       lq   median       uq      max
1 as.POSIXlt(Dates, format = "%d-%m-%Y") 32.38596 33.81324 34.78487 35.52183 61.80171
2            strptime(Dates, "%d-%m-%Y") 31.73224 33.22964 34.20407 34.88167 52.12422

脱衣舞时间似乎稍微快一点。怎么回事?为什么会有两个相似的函数,或者它们之间有什么不同?

71905 次浏览

Well, the functions do different things.

First, there are two internal implementations of date/time: POSIXct, which stores seconds since UNIX epoch (+some other data), and POSIXlt, which stores a list of day, month, year, hour, minute, second, etc.

strptime is a function to directly convert character vectors (of a variety of formats) to POSIXlt format.

as.POSIXlt converts a variety of data types to POSIXlt. It tries to be intelligent and do the sensible thing - in the case of character, it acts as a wrapper to strptime.

as.POSIXct converts a variety of data types to POSIXct. It also tries to be intelligent and do the sensible thing - in the case of character, it runs strptime first, then does the conversion from POSIXlt to POSIXct.

It makes sense that strptime is faster, because strptime only handles character input whilst the others try to determine which method to use from input type. It should also be a bit safer in that being handed unexpected data would just give an error, instead of trying to do the intelligent thing that might not be what you want.

There are two POSIXt types, POSIXct and POSIXlt. "ct" can stand for calendar time, it stores the number of seconds since the origin. "lt", or local time, keeps the date as a list of time attributes (such as "hour" and "mon"). Try these examples:

date.hour=strptime("2011-03-27 01:30:00", "%Y-%m-%d %H:%M:%S")


date=c("26/10/2016")


time=c("19:51:30")


day<-paste(date,"T", time)


day.time1=as.POSIXct(day,format="%d/%m/%Y T %H:%M:%S",tz="Europe/Paris")


day.time1


day.time1$year


day.time2=as.POSIXlt(day,format="%d/%m/%Y T %H:%M:%S",tz="Europe/Paris")


day.time2


day.time2$year