对特定的数据框列应用()函数

我想对数据框架使用 application 函数,但是只对最后5列应用该函数。

B<- by(wifi,(wifi$Room),FUN=function(y){apply(y, 2, A)})

这将 A 应用于 y 的所有列

B<- by(wifi,(wifi$Room),FUN=function(y){apply(y[4:9], 2, A)})

这只适用于 y 的4-9列,但是 B 的总返回值从前3列中去掉了... ... 我仍然想要那些,我只是不想 A 适用于它们。

wifi[,1:3]+B

也不是我想要的。

258721 次浏览

Using an example data.frame and example function (just +1 to all values)

A <- function(x) x + 1
wifi <- data.frame(replicate(9,1:4))
wifi


#  X1 X2 X3 X4 X5 X6 X7 X8 X9
#1  1  1  1  1  1  1  1  1  1
#2  2  2  2  2  2  2  2  2  2
#3  3  3  3  3  3  3  3  3  3
#4  4  4  4  4  4  4  4  4  4


data.frame(wifi[1:3], apply(wifi[4:9],2, A) )
#or
cbind(wifi[1:3], apply(wifi[4:9],2, A) )


#  X1 X2 X3 X4 X5 X6 X7 X8 X9
#1  1  1  1  2  2  2  2  2  2
#2  2  2  2  3  3  3  3  3  3
#3  3  3  3  4  4  4  4  4  4
#4  4  4  4  5  5  5  5  5  5

Or even:

data.frame(wifi[1:3], lapply(wifi[4:9], A) )
#or
cbind(wifi[1:3], lapply(wifi[4:9], A) )


#  X1 X2 X3 X4 X5 X6 X7 X8 X9
#1  1  1  1  2  2  2  2  2  2
#2  2  2  2  3  3  3  3  3  3
#3  3  3  3  4  4  4  4  4  4
#4  4  4  4  5  5  5  5  5  5

lapply is probably a better choice than apply here, as apply first coerces your data.frame to an array which means all the columns must have the same type. Depending on your context, this could have unintended consequences.

The pattern is:

df[cols] <- lapply(df[cols], FUN)

The 'cols' vector can be variable names or indices. I prefer to use names whenever possible (it's robust to column reordering). So in your case this might be:

wifi[4:9] <- lapply(wifi[4:9], A)

An example of using column names:

wifi <- data.frame(A=1:4, B=runif(4), C=5:8)
wifi[c("B", "C")] <- lapply(wifi[c("B", "C")], function(x) -1 * x)

I think what you want is mapply. You could apply the function to all columns, and then just drop the columns you don't want. However, if you are applying different functions to different columns, it seems likely what you want is mutate, from the dplyr package.

As mentioned, you simply want the standard R apply function applied to columns (MARGIN=2):

wifi[,4:9] <- apply(wifi[,4:9], MARGIN=2, FUN=A)

Or, for short:

wifi[,4:9] <- apply(wifi[,4:9], 2, A)

This updates columns 4:9 in-place using the A() function. Now, let's assume that na.rm is an argument to A(), which it probably should be. We can pass na.rm=T to remove NA values from the computation like so:

wifi[,4:9] <- apply(wifi[,4:9], MARGIN=2, FUN=A, na.rm=T)

The same is true for any other arguments you want to pass to your custom function.

The easiest way is to use the mutate function:

dataFunctionUsed <- data %>%
mutate(columnToUseFunctionOn = function(oldColumn ...))

This task is easily achieved with the dplyr package's across functionality.

Borrowing the data structure suggested by thelatemail:

A <- function(x) x + 1
wifi <- data.frame(replicate(9,1:4))

We can indicate the columns we wish to apply the function to either by index like this:

library(dplyr)
wifi %>%
mutate(across(4:9, A))
#  X1 X2 X3 X4 X5 X6 X7 X8 X9
#1  1  1  1  2  2  2  2  2  2
#2  2  2  2  3  3  3  3  3  3
#3  3  3  3  4  4  4  4  4  4
#4  4  4  4  5  5  5  5  5  5

Or by name:

wifi %>%
mutate(across(X4:X9, A))
#  X1 X2 X3 X4 X5 X6 X7 X8 X9
#1  1  1  1  2  2  2  2  2  2
#2  2  2  2  3  3  3  3  3  3
#3  3  3  3  4  4  4  4  4  4
#4  4  4  4  5  5  5  5  5  5