rdataframeconditional-statementssqldf

Fill a data.frame column with values from other data.frame column with a condition R


I have this two data.frames:

  df1 = data.frame(
    "scientific_name"=c("Anchietea ballardii","Anemia patens","Cololobus argenteus"),
    "threat_status"=c("VU","EN","EN")
  )

> df1
      scientific_name threat_status
1 Anchietea ballardii            VU
2       Anemia patens            EN
3 Cololobus argenteus            EN
> 


  df2 = data.frame(
    "scientific_name"=c("Anchietea ballardii","Anemia patens","Cololobus argenteus","Coleocephalocereus pluricostatus", "Dyckia bracteata", "Merianthera parvifolia", "Kielmeyera rupestris"),
    "threat_status"=c(NA)
  )

> df2
                     scientific_name threat_status
1                Anchietea ballardii            NA
2                      Anemia patens            NA
3                Cololobus argenteus            NA
4   Coleocephalocereus pluricostatus            NA
5                   Dyckia bracteata            NA
6             Merianthera parvifolia            NA
7               Kielmeyera rupestris            NA
>

I need to fill the df2$threat_status with the correspondent df1$threat_status, but when there is not any value for the column df2$threat_status it can be filled with "LC". I'm trying with sqldf INSERT TO, but it's not working. I's like to have a solution in plain R, something tells me it would be more elegant. Somebody could help? Thanks a lot!


Solution

  • In base R, you can do:

    df2 = merge(df2[,1,drop=F],df1, by="scientific_name", all.x=T)
    df2[is.na(df2$threat_status), 2] <- "LC"
    

    Output:

                       scientific_name threat_status
    1              Anchietea ballardii            VU
    2                    Anemia patens            EN
    3 Coleocephalocereus pluricostatus            LC
    4              Cololobus argenteus            EN
    5                 Dyckia bracteata            LC
    6             Kielmeyera rupestris            LC
    7           Merianthera parvifolia            LC
    

    Another option in your example is to simply set all the threat_status values in df2 to "LC", and then just row bind df1 to the rows in df2 where df1$scientfic_name does not appear:

    df2$threat_status="LC"
    rbind(df1,df2[!df2$scientific_name %in% df1$scientific_name,])
    

    Output:

                       scientific_name threat_status
    1              Anchietea ballardii            VU
    2                    Anemia patens            EN
    3              Cololobus argenteus            EN
    4 Coleocephalocereus pluricostatus            LC
    5                 Dyckia bracteata            LC
    6           Merianthera parvifolia            LC
    7             Kielmeyera rupestris            LC