向大型数据框添加索引(数字 ID)列

我有一个读取到一个数据帧大型 CSV 文件。Csv 文件中的数据来自表示用户信息的多个网站。例如,这里是数据框架的结构。

user_id, number_of_logins, number_of_images, web
001, 34, 3, aa.com
002, 4, 4, aa.com
034, 3, 3, aa.com
001, 12, 4, bb.com
002, 1, 3, bb.com
034, 2, 2, cc.com

正如您可以看到的,一旦我将数据放入数据框,user _ id 就不再是唯一的 id,这就导致了所有的分析。我试图在 user_id之前添加另一列,它类似于 "generated_uid",并且基本上使用 data.frame的索引来填充该列。最好的方法是什么。

315619 次浏览

You can add a sequence of numbers very easily with

data$ID <- seq.int(nrow(data))

If you are already using library(tidyverse), you can use

data <- tibble::rowid_to_column(data, "ID")

Well, if I understand you correctly. You can do something like the following.

To show it, I first create a data.frame with your example

df <-
scan(what = character(), sep = ",", text =
"001, 34, 3, aa.com
002, 4, 4, aa.com
034, 3, 3, aa.com
001, 12, 4, bb.com
002, 1, 3, bb.com
034, 2, 2, cc.com")


df <- as.data.frame(matrix(df, 6, 4, byrow = TRUE))
colnames(df) <- c("user_id", "number_of_logins", "number_of_images", "web")

You can then run one of the following lines to add a column (at the end of the data.frame) with the row number as the generated user id. The second lines simply adds leading zeros.

df$generated_uid  <- 1:nrow(df)
df$generated_uid2 <- sprintf("%03d", 1:nrow(df))

If you absolutely want the generated user id to be the first column, you can add the column like so:

df <- cbind("generated_uid3" = sprintf("%03d", 1:nrow(df)), df)

or simply rearrage the columns.

Using alternative dplyr package:

library("dplyr")


df <- df %>% mutate(id = row_number())

If your data.frame is a data.table, you can use special symbol .I:

data[, ID := .I]