Specifying colClasses in the read.csv

I am trying to specify the colClasses options in the read.csv function in R. In my data, the first column time is basically a character vector, while the rest of the columns are numeric.

data <- read.csv("test.csv", comment.char="" ,
colClasses=c(time="character", "numeric"),
strip.white=FALSE)

In the above command, I want R to read in the time column as "character" and the rest as numeric. Although the data variable did have the correct result after the command completed, R returned the following warnings. I am wondering how I can fix these warnings?

Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, : not all columns named in 'colClasses' exist
2: In tmp[i[i > 0L]] <- colClasses : number of items to replace is not a multiple of replacement length

Derek

210218 次浏览

ColClass 向量的长度必须等于导入的列数。假设数据集列的 休息为5:

colClasses=c("character",rep("numeric",5))

假设您的“ time”列至少有一个非数字字符的观察值,并且所有其他列都只有数字,那么“ read.csv”的默认值将是将“ time”作为“ factor”读取,并将所有其他列作为“ numeric”读取。因此,设置‘ stringsAsFactor = F’的结果将与手动设置‘ colClass’的结果相同,即,

data <- read.csv('test.csv', stringsAsFactors=F)

只能为一列指定 colClasse。

因此,在你的例子中,你应该使用:

data <- read.csv('test.csv', colClasses=c("time"="character"))

如果你想引用标题中的名字而不是列号,你可以这样做:

fname <- "test.csv"
headset <- read.csv(fname, header = TRUE, nrows = 10)
classes <- sapply(headset, class)
classes[names(classes) %in% c("time")] <- "character"
dataset <- read.csv(fname, header = TRUE, colClasses = classes)

对于多个没有标题的日期时间列和大量的列,假设我的日期时间字段在第36和38列中,我希望它们作为字符字段读入:

data<-read.csv("test.csv", head=FALSE,   colClasses=c("V36"="character","V38"="character"))

I know OP asked about the utils::read.csv function, but let me provide an answer for these that come here searching how to do it using readr::read_csv from the tidyverse.

read_csv ("test.csv", col_names=FALSE, col_types = cols (.default = "c", time = "i"))

这应该将所有列的默认类型设置为 性格,而将 时间解析为整数。

如果我们把《亨迪》和《奥德修斯 · 绮色佳》的贡献结合起来,我们就会变得更干净、更通用(例如,适应性?)一段代码。

    data <- read.csv("test.csv", head = F, colClasses = c(V36 = "character", V38 = "character"))