使用 merge()函数只连接 R 中选定的列

我试图左连接2个数据帧,但我不想连接所有的变量从第二个数据集:

例如,我有数据集1(DF1) :

  Cl    Q   Sales  Date
A    2   30     01/01/2014
A    3   24     02/01/2014
A    1   10     03/01/2014
B    4   10     01/01/2014
B    1   20     02/01/2014
B    3   30     03/01/2014

我想离开连接数据集2(DF2) :

Client  LO  CON
A    12  CA
B    11  US
C    12  UK
D    10  CA
E    15  AUS
F    91  DD

我可以用以下代码左连接:

Merge (x = DF1,y = DF2,by = “ Client”,all.x = TRUE) :

   Client Q    Sales   Date             LO      CON
A      2    30      01/01/2014       12      CA
A      3    24      02/01/2014       12      CA
A      1    10      03/01/2014       12      CA
B      4    10      01/01/2014       11      US
B      1    20      02/01/2014       11      US
B      3    30      03/01/2014       11      US

但是,它合并了 LO 和 CON 列。我只想合并 LO 列。

   Client Q    Sales   Date             LO
A      2    30      01/01/2014       12
A      3    24      02/01/2014       12
A      1    10      03/01/2014       12
B      4    10      01/01/2014       11
B      1    20      02/01/2014       11
B      3    30      03/01/2014       11
183963 次浏览

You can do this by subsetting the data you pass into your merge:

merge(x = DF1, y = DF2[ , c("Client", "LO")], by = "Client", all.x=TRUE)

Or you can simply delete the column after your current merge :)

I think it's a little simpler to use the dplyr functions select and left_join ; at least it's easier for me to understand. The join function from dplyr are made to mimic sql arguments.

 library(tidyverse)


DF2 <- DF2 %>%
select(client, LO)


joined_data <- left_join(DF1, DF2, by = "Client")

You don't actually need to use the "by" argument in this case because the columns have the same name.

Nothing elegant but this could be another satisfactory answer.

merge(x = DF1, y = DF2, by = "Client", all.x=TRUE)[,c("Client","LO","CON")]

This will be useful especially when you don't need the keys that were used to join the tables in your results.

Alternative solution using left_join() and select() from the dplyr package, without intermediate steps:

DF1 <- DF1 %>%
left_join(DF2, by = "Client") %>%
select(-CON)

One-liner with dplyr

DF_joined <- left_join(DF1, select(DF2, -CON), by = "Client")

For Client column in both tables:

DF_joined <- DF1 %>% left_join(DF2 %>% select(Client,CON))