I asked a question about this a few months back, and I thought the answer had solved my problem, but I ran into the problem again and the solution didn't work for me.
I'm importing a CSV:
orders <- read.csv("<file_location>", sep=",", header=T, check.names = FALSE)
Here's the structure of the dataframe:
str(orders)
'data.frame': 3331575 obs. of 2 variables:
$ OrderID : num -2034590217 -2034590216 -2031892773 -2031892767 -2021008573 ...
$ OrderDate: Factor w/ 402 levels "2010-10-01","2010-10-04",..: 263 263 269 268 301 300 300 300 300 300 ...
If I run the length
command on the first column, OrderID, I get this:
length(orders$OrderID)
[1] 0
If I run the length
on OrderDate, it returns correctly:
length(orders$OrderDate)
[1] 3331575
This is a copy/paste of the head
of the CSV
.
OrderID,OrderDate
-2034590217,2011-10-14
-2034590216,2011-10-14
-2031892773,2011-10-24
-2031892767,2011-10-21
-2021008573,2011-12-08
-2021008572,2011-12-07
-2021008571,2011-12-07
-2021008570,2011-12-07
-2021008569,2011-12-07
Now, if I re-run the read.csv
, but take out the check.names
option, the first column of the dataframe
now has an X. at the start of the name.
orders2 <- read.csv("<file_location>", sep=",", header=T)
str(orders2)
'data.frame': 3331575 obs. of 2 variables:
$ X.OrderID: num -2034590217 -2034590216 -2031892773 -2031892767 -2021008573 ...
$ OrderDate: Factor w/ 402 levels "2010-10-01","2010-10-04",..: 263 263 269 268 301 300 300 300 300 300 ...
length(orders$X.OrderID)
[1] 3331575
This works correctly.
My question is why does R
add an X. to beginning of the first column name? As you can see from the CSV file, there are no special characters. It should be a simple load. Adding check.names
, while will import the name from the CSV, will cause the data to not load correctly for me to perform analysis on.
What can I do to fix this?
Side note: I realize this is a minor - I'm just more frustrated by the fact that I think I am loading correctly, yet not getting the result I expected. I could rename the column using colnames(orders)[1] <- "OrderID"
, but still want to know why it doesn't load correctly.