I need some help with this problem, which I am unable to resolve.
I have a dataframe called Base, which contains columns such as ID_1
, organization_1
, and year_beginning1
. In the organization_1
column, I will have names like School X, School Y, and School Z, which correspond to dataframes of these organizations. In these dataframes, such as the one for School X, I also have columns ID
and Start Year of the Program
.
I want R to look up these dataframes based on the organization_1
column and, using the ID
column, retrieve the number in the Start Year of the Program
column and insert it into the Year_beginning1
column.
The code I tred:
map <- list(
"School X" = SchoolX,
"School Y" = SchoolY,
"School Z" = SchoolZ
)
for (i in 1:nrow(Base)) {
nome_org <- Base$Organization1[i]
df_org <- map[[name_org]]
if (!is.null(df_org)) {
id_base <- Base$ID_1[i]
Year_beginning <- df_org$`Start year of the program`[df_org$ID == id_base]
if (length(Year_beginning) > 0) {
Base$Year_beginning1[i] <- Year_beginning
}
}
}
From reading your description, this is what I suspect your data may look like:
Base <- data.frame(ID_1=1:3, org_1=c("School X", "School Y", "School Z"), yearbeg_1=2000+1:3)
Base
# ID_1 org_1 yearbeg_1
# 1 1 School X 2001
# 2 2 School Y 2002
# 3 3 School Z 2003
School_X <- data.frame(ID=c(1L, 5L), startyear=1998+c(1,5))
School_Y <- data.frame(ID=c(2L, 9L), startyear=1998+c(2,9))
School_Z <- data.frame(ID=c(3L, 10L), startyear=1998+c(3,10))
I think this can easily be a "JOIN" operation, but to do it more easily we need to have the schools combined into one frame, with its org as a column.
I'll use dplyr
here, it is easy to read/understand, is well documented, and very heavily used. However, this can mostly be done in data.table
(about as easily, though it has a steeper learning curve), or in base R with a little more work and care.
Also, I'm going to assume for this example that your Base
references "School X"
(space) and the other frame is named something like School_X
(underscore); in the call to bind_rows
, the name of the argument "School X"=
is intended to match what is found in Base
.
library(dplyr) # bind_rows, mutate, left_join
Schools <- bind_rows(
"School X" = School_X,
"School Y" = School_Y,
"School Z" = School_Z,
.id="org")
Schools
# org ID startyear
# 1 School X 1 1999
# 2 School X 5 2003
# 3 School Y 2 2000
# 4 School Y 9 2007
# 5 School Z 3 2001
# 6 School Z 10 2008
From here, it's a simple "JOIN" operation. The "left" join ensures that all rows on the LHS (Base
here) are preserved, and any non-matching rows in Schools
will not be included in the output (notice that each of my schools have IDs that are not desired, and those IDs are not included in this output).
left_join(Base, Schools, join_by(org_1 == org, ID_1 == ID))
# ID_1 org_1 yearbeg_1 startyear
# 1 1 School X 2001 1999
# 2 2 School Y 2002 2000
# 3 3 School Z 2003 2001
For more information on JOINs in general, see