rdplyrtidy

merge and replace the duplicate value using R


i have 2 dataframe and want to combine, but if there are same value must to replace it to NA, example

df1=data.frame(x1=c(1,1,1,2,2,2,2),x2=c("a","a","b","b","c","c","c"),x3=c("t","u","v","w","x","y","z"),x4=c("apple","apple","mango","mango","mango","mango","mango"))
df2=data.frame(x1=c(1,1,1,1,2,2,2,2),x2=c("a","a","a","b","b","b","c","c"),x3=c("t","u","u","v","w","x","y","z"),x5=c("apple A","apple A","apple B","mango A","mango B","mango A","mango A","mango A"),x6=c(10,10,20,10,10,30,30,30))

my expected df

df3=data.frame(x1=c(1,1,1,1,2,2,2,2),x2=c("a","a","a","b","b","b","c","c"),x3=c("t","u","u","v","w","x","y","z"),x4=c("apple","apple","apple","mango","mango","mango","mango","mango"),x5=c("apple A","apple A","apple B","mango A","mango B","mango A","mango A","mango A"),x6=c(10,NA,20,10,10,30,30,NA))

merge df1 and df2, but if df2 had duplicated values on x1,x2 and x5, x6 is NA


Solution

  • Consider doing:

    is.na(df2$x6) <- duplicated(df2[c('x1' , 'x2', 'x5', 'x6')])
    merge(df1, df2, all.x = TRUE)
    
      x1 x2 x3    x4      x5 x6
    1  1  a  t apple apple A 10
    2  1  a  u apple apple A NA
    3  1  a  u apple apple B 20
    4  1  b  v mango mango A 10
    5  2  b  w mango mango B 10
    6  2  c  x mango    <NA> NA
    7  2  c  y mango mango A 30
    8  2  c  z mango mango A NA