如何在R中向数据帧添加行？

小开

不是很优雅，但是：

data.frame(rbind(as.matrix(df), as.matrix(de)))

在rbind功能的文档中：

对于_ABC，_0列名称取自具有适当名称的第一个参数：矩阵的ColNames..。

小开

最佳答案

正如@Khashaa和@Richard Scriven在注释中指出的那样，您必须为要附加的所有数据帧设置一致的列名。

因此，您需要显式声明第二个数据帧的列名，de，然后使用rbind()。您只能设置第一个数据帧的列名，df：

df<-data.frame("hi","bye")
names(df)<-c("hello","goodbye")


de<-data.frame("hola","ciao")
names(de)<-c("hello","goodbye")


newdf <- rbind(df, de)

小开

让我们简单地说：

df[nrow(df) + 1,] = c("v1","v2")

小开

或者，受@Matheusaraujo的启发：

df[nrow(df) + 1,] = list("v1","v2")

这将允许混合数据类型。

小开

我喜欢list，而不是c，因为它能更好地处理混合数据类型。在原发帖人的问题中增加一栏：

#Create an empty data frame
df <- data.frame(hello=character(), goodbye=character(), volume=double())
de <- list(hello="hi", goodbye="bye", volume=3.0)
df = rbind(df,de, stringsAsFactors=FALSE)
de <- list(hello="hola", goodbye="ciao", volume=13.1)
df = rbind(df,de, stringsAsFactors=FALSE)

请注意，如果字符串/因子转换很重要，则需要一些额外的控制。

或者使用原始变量和Matheusaraujo/Ytsen de Boer的解决方案：

df[nrow(df) + 1,] = list(hello="hallo",goodbye="auf wiedersehen", volume=20.2)

请注意，除非DataFrame中存在现有数据，否则此解决方案不能很好地处理字符串。

小开

如果您知道两个DataFrame共享相同的列和类型，则有一种更简单的方法可以将记录从一个DataFrame附加到另一个DataFrame.要将xx中的一行附加到yy，只需执行以下操作，其中i是xx中的_ABC_第2行。

yy[nrow(yy)+1,] <- xx[i,]

就这么简单。没有凌乱的束缚。如果需要将所有xx附加到yy，则调用循环或利用R的序列功能并执行以下操作：

zz[(nrow(zz)+1):(nrow(zz)+nrow(yy)),] <- yy[1:nrow(yy),]

小开

在创建数据帧时，我需要添加stringsAsFactors=FALSE。

> df <- data.frame("hello"= character(0), "goodbye"=character(0))
> df
[1] hello   goodbye
<0 rows> (or 0-length row.names)
> df[nrow(df) + 1,] = list("hi","bye")
Warning messages:
1: In `[<-.factor`(`*tmp*`, iseq, value = "hi") :
invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, iseq, value = "bye") :
invalid factor level, NA generated
> df
hello goodbye
1  <NA>    <NA>
>

.

> df <- data.frame("hello"= character(0), "goodbye"=character(0), stringsAsFactors=FALSE)
> df
[1] hello   goodbye
<0 rows> (or 0-length row.names)
> df[nrow(df) + 1,] = list("hi","bye")
> df[nrow(df) + 1,] = list("hola","ciao")
> df[nrow(df) + 1,] = list(hello="hallo",goodbye="auf wiedersehen")
> df
hello         goodbye
1    hi             bye
2  hola            ciao
3 hallo auf wiedersehen
>

小开

一定要指定创建数据帧时stringsAsFactors=FALSE：

> rm(list=ls())
> trigonometry <- data.frame(character(0), numeric(0), stringsAsFactors=FALSE)
> colnames(trigonometry) <- c("theta", "sin.theta")
> trigonometry
[1] theta     sin.theta
<0 rows> (or 0-length row.names)
> trigonometry[nrow(trigonometry) + 1, ] <- c("0", sin(0))
> trigonometry[nrow(trigonometry) + 1, ] <- c("pi/2", sin(pi/2))
> trigonometry
theta sin.theta
1     0         0
2  pi/2         1
> typeof(trigonometry)
[1] "list"
> class(trigonometry)
[1] "data.frame"

创建数据帧时

未使用stringsAsFactors=FALSE将尝试添加新行时导致以下错误：

> trigonometry[nrow(trigonometry) + 1, ] <- c("0", sin(0))
Warning message:
In `[<-.factor`(`*tmp*`, iseq, value = "0") :
invalid factor level, NA generated

小开

现在有来自tibble或tidyverse包的add_row()。

library(tidyverse)
df %>% add_row(hello = "hola", goodbye = "ciao")

未指定的列获得NA。

小开

如果您想创建一个空数据帧并在循环中添加内容，以下内容可能会有所帮助：

# Number of students in class
student.count <- 36


# Gather data about the students
student.age <- sample(14:17, size = student.count, replace = TRUE)
student.gender <- sample(c('male', 'female'), size = student.count, replace = TRUE)
student.marks <- sample(46:97, size = student.count, replace = TRUE)


# Create empty data frame
student.data <- data.frame()


# Populate the data frame using a for loop
for (i in 1 : student.count) {
# Get the row data
age <- student.age[i]
gender <- student.gender[i]
marks <- student.marks[i]


# Populate the row
new.row <- data.frame(age = age, gender = gender, marks = marks)


# Add the row
student.data <- rbind(student.data, new.row)
}


# Print the data frame
student.data

希望能有所帮助：）

小开

要形式化其他人使用setNames的目的，请执行以下操作：

add_row <- function(original_data, new_vals_list){
# appends row to dataset while assuming new vals are ordered and classed appropriately.
# new_vals must be a list not a single vector.
rbind(
original_data,
setNames(data.frame(new_vals_list), colnames(original_data))
)
}

它在合法时保留类，并在其他地方传递错误。

m <- mtcars[ ,1:3]
m$cyl <- as.factor(m$cyl)
str(m)


#'data.frame':  32 obs. of  3 variables:
# $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
# $ disp: num  160 160 108 258 360 ...

添加4时保留的因子，即使它是作为数字传递的。

str(add_row(m, list(20,4,160)))
#'data.frame':  33 obs. of  3 variables:
# $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
# $ disp: num  160 160 108 258 360 ...

尝试传递非4，6，8将返回因子级别无效的错误。

str(add_row(m, list(20,3,160)))
# 'data.frame': 33 obs. of  3 variables:
# $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
# $ disp: num  160 160 108 258 360 ...
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = 3) :
invalid factor level, NA generated

小开

要在循环中构建Data.Frame，请执行以下操作：

df <- data.frame()
for(i in 1:10){
df <- rbind(df, data.frame(str="hello", x=i, y=i*10))
}

小开

我将补充其他建议。我使用基本的R代码来创建一个DataFrame：

data_set_name <- data.frame(data_set)

现在，我总是建议复制原始数据帧，以防您需要返回或测试某些内容。我在下面列出了：

data_set_name_copy <- data_set_name

现在，如果您想添加一个新列，代码将如下所示：

data_set_name_copy$Name_of_New_Column <- Data_for_New_Column

$表示您正在添加一个新列，并在其之后插入新条目的命名法/名称。

小开

我认为，

rbind.data.frame(df, de)

应该可以了。

小开

在Dplyr>；中=1.0.0您可以使用row_insert：

df1 <- data.frame(hello = "hi", goodbye = "bye")
df2 <- data.frame(hello = "hola", goodbye = "ciao")


library(dplyr)


df1 %>%
rows_insert(df2)
Matching, by = "hello"
hello goodbye
1    hi     bye
2  hola    ciao

注意：df2中的所有列必须存在于df1中，但df1中的所有列并非必须存在于df2中。

对于其他行为，还有其他row_*选项。例如，您可以使用row_upsert，如果这些值已经存在，它将覆盖这些值，否则它将插入这些值：

df2 <- data.frame(hello = c("hi", "hola"), goodbye = c("goodbye", "ciao"))


library(dplyr)


df1 %>%
rows_upsert(df2)
Matching, by = "hello"
hello goodbye
1    hi goodbye # bye updated to goodbye since "hi" was already in data frame
2  hola    ciao # inserted because "hola" was not in the data frame

这些函数通过匹配键列来工作。如果未指定by参数，则默认行为是将第二个数据帧（本例中的df2）中的第一列与第一个数据帧（本例中的df1）匹配。