I have this dataset, here is an example:
df1
State Year ColA ColB ColC
OH 2000 200 190 180
OH 2002 218 200 176
OH 2010 218 205 185
KY 2000 200 190 NA
KY 2002 218 200 NA
KY 2010 218 205 NA
PA 2000 200 190 180
PA 2002 214 200 178
PA 2010 219 205 185
MI 2000 200 190 NA
MI 2002 218 200 NA
MI 2010 218 205 NA
As you see, I have a few missing values in ColC. I have these values in a different dataset:
df2
State Year ColB ColC_Fill
KY 2000 180 200
KY 2002 176 210
KY 2010 185 211
MI 2000 200 211
MI 2002 200 206
MI 2010 200 205
I want to get this:
df1
State Year ColA ColB ColC
OH 2000 200 190 180
OH 2002 218 200 176
OH 2010 218 205 185
KY 2000 200 190 200
KY 2002 218 200 210
KY 2010 218 205 211
PA 2000 200 190 180
PA 2002 214 200 178
PA 2010 219 205 185
MI 2000 200 190 211
MI 2002 218 200 206
MI 2010 218 205 205
I want to fill in the missing values of df1 with those in df2. I have tried this code:
df$ColC[is.na(df$ColC)]<-left_join(df1[,c(1:2,5)], df2[,c(1:2,4)])
This code seemed to work a few days ago, but when I tried it again yesterday, it is replacing ColC with a list of all the variables. I have tried several iterations, but have had no luck. Any help would be appreciated! Thanks in advance.
Many roads lead to Rome. Here is a potential solution:
dataset1 <- dataset1 %>%
left_join(dataset2 %>% select(State, Year, ColC_Fill), by = c("State", "Year")) %>%
mutate(ColC = ifelse(is.na(ColC), ColC_Fill, ColC) ) %>%
select(-ColC_Fill)
First, I select the relevant columns by name from dataset2.
I then do a left_join()
on "State", "Year"
.
Third, I replace ColC wil ColC_fill wehere ColC is NA
.
Finally, I drop ColC_Fill as it is not needed anymore.