在 read.table/read.csv 中为 colClass 参数指定自定义 Date 格式

问题:

在 read.table/read.csv 中使用 colClass 参数时,有没有指定 Date 格式的方法?

(我意识到我可以在导入之后进行转换,但是有很多这样的日期列,在导入步骤中进行转换会更容易)


例如:

我有一个格式为 %d/%m/%Y的带有日期列的.csv。

dataImport <- read.csv("data.csv", colClasses = c("factor","factor","Date"))

这会导致转换错误。例如,15/07/2008变成 0015-07-20


可重复的代码:

data <-
structure(list(func_loc = structure(c(1L, 2L, 3L, 3L, 3L, 3L,
3L, 4L, 4L, 5L), .Label = c("3076WAG0003", "3076WAG0004", "3076WAG0007",
"3076WAG0009", "3076WAG0010"), class = "factor"), order_type = structure(c(3L,
3L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 1L), .Label = c("PM01", "PM02",
"PM03"), class = "factor"), actual_finish = structure(c(4L, 6L,
1L, 2L, 3L, 7L, 1L, 8L, 1L, 5L), .Label = c("", "11/03/2008",
"14/08/2008", "15/07/2008", "17/03/2008", "19/01/2009", "22/09/2008",
"6/09/2007"), class = "factor")), .Names = c("func_loc", "order_type",
"actual_finish"), row.names = c(NA, 10L), class = "data.frame")




write.csv(data,"data.csv", row.names = F)


dataImport <- read.csv("data.csv")
str(dataImport)
dataImport


dataImport <- read.csv("data.csv", colClasses = c("factor","factor","Date"))
str(dataImport)
dataImport

输出结果如下:

code output

56808 次浏览

You can write your own function that accepts a string and converts it to a Date using the format you want, then use the setAs to set it as an as method. Then you can use your function as part of the colClasses.

Try:

setAs("character","myDate", function(from) as.Date(from, format="%d/%m/%Y") )


tmp <- c("1, 15/08/2008", "2, 23/05/2010")
con <- textConnection(tmp)


tmp2 <- read.csv(con, colClasses=c('numeric','myDate'), header=FALSE)
str(tmp2)

Then modify if needed to work for your data.

Edit ---

You might want to run setClass('myDate') first to avoid the warning (you can ignore the warning, but it can get annoying if you do this a lot and this is a simple call that gets rid of it).

If there is only 1 date format you want to change, you could use the Defaults package to change the default format within as.Date.character

library(Defaults)
setDefaults('as.Date.character', format = '%d/%M/%Y')
dataImport <- read.csv("data.csv", colClasses = c("factor","factor","Date"))
str(dataImport)
## 'data.frame':    10 obs. of  3 variables:
##  $ func_loc     : Factor w/ 5 levels "3076WAG0003",..: 1 2 3 3 3 3 3 4 4 5
##  $ order_type   : Factor w/ 3 levels "PM01","PM02",..: 3 3 1 1 1 1 2 2 3 1
##  $ actual_finish: Date, format: "2008-10-15" "2009-10-19" NA "2008-10-11" ...

I think @Greg Snow's answer is far better, as it does not change the default behaviour of an often used function.

In case you need time also:

setClass('yyyymmdd-hhmmss')
setAs("character","yyyymmdd-hhmmss", function(from) as.POSIXct(from, format="%Y%m%d-%H%M%S"))
d <- read.table(colClasses="yyyymmdd-hhmmss", text="20150711-130153")
str(d)
## 'data.frame':    1 obs. of  1 variable:
## $ V1: POSIXct, format: "2015-07-11 13:01:53"

A long time ago, in the meantime the problem has been solved by Hadley Wickham. So nowadays the solution is reduced to a oneliner:

library(readr)
data <- read_csv("data.csv",
col_types = cols(actual_finish = col_datetime(format = "%d/%m/%Y")))

Maybe we want even to get rid of unnecessary stuff:

data <- as.data.frame(data)