I am trying to select or subset multiple data frames with different number of columns. They all contain the same columns of interest, so I am trying to make them all contain the same columns so I can then append into one data frame. I am trying to be as elegant and efficient as possible, but my codes does not seem to be working. This is what I tried to do:
Suppose I have the following data frames:
df1 <- matrix(1:12, 3,4, dimnames = list(NULL, LETTERS[1:4]))
df2 <- matrix(12:26, 3, 5, dimnames = list(NULL, LETTERS[1:5]))
df1 <- as.data.frame(df1)
df2 <- as.data.frame(df2)
I tried to subset both data frames creating a function and then using lapply
. Suppose I only want to keep columns A, C, and D:
select_function <- function(x){
dplyr::select(`A`,`C`,`D`)
}
list <- list(df1, df2)
df.list <- lapply(list, function(x) select_function)
I then tried to append the list into one data frame:
new.df <- do.call(rbind, df.list)
Codes are not working. I think the line with lapply is not correct, not sure what is being generated in df.list. I hope I could communicate what I tried to do. Please let me know alternative ways to achieve this.
You are not passing your data to your function. It should look like:
select_cols <- function(df) {
df |>
dplyr::select(A, C, D)
}
Then you can just do:
lapply(l, select_cols)
# [[1]]
# A C D
# 1 1 7 10
# 2 2 8 11
# 3 3 9 12
# [[2]]
# A C D
# 1 12 18 21
# 2 13 19 22
# 3 14 20 23
Or alternatively, in base R:
cols <- c("A", "C", "D")
lapply(l, \(df) df[cols])
# [[1]]
# A C D
# 1 1 7 10
# 2 2 8 11
# 3 3 9 12
# [[2]]
# A C D
# 1 12 18 21
# 2 13 19 22
# 3 14 20 23