I often use do.call("cbind.data.frame", my_list)
after a lapply()
call and usually I face no problems. For some reason in the following list the column names are different after binding them; the list number preceeds the names.
My list is something like this:
dput(my_list)
list(`1` = structure(list(`Pulmonary_embolism~f.20002.0` = NA,
`Pulmonary_embolism~f.20002.1` = NA, `Pulmonary_embolism~f.20002.2` = NA,
`Pulmonary_embolism~f.20002.3` = NA, `Pulmonary_embolism~f.20002.all` = NA), row.names = "1", class = "data.frame"),
`2` = structure(list(`Pulmonary_embolism~f.6152.0` = NA,
`Pulmonary_embolism~f.6152.1` = NA, `Pulmonary_embolism~f.6152.2` = NA,
`Pulmonary_embolism~f.6152.3` = NA, `Pulmonary_embolism~f.6152.all` = NA), row.names = "1", class = "data.frame"))
But after do.call("cbind.data.frame", my_list)
the variables change:
names(do.call("cbind.data.frame", my_list))
[1] "1.Pulmonary_embolism~f.20002.0" "1.Pulmonary_embolism~f.20002.1" "1.Pulmonary_embolism~f.20002.2" "1.Pulmonary_embolism~f.20002.3" "1.Pulmonary_embolism~f.20002.all"
[6] "2.Pulmonary_embolism~f.6152.0" "2.Pulmonary_embolism~f.6152.1" "2.Pulmonary_embolism~f.6152.2" "2.Pulmonary_embolism~f.6152.3" "2.Pulmonary_embolism~f.6152.all"
How to prevent the list number beeing part of the column name?
First you could extract the column names using lapply
with unlist
and make sure you use use.names=FALSE
to remove the names of the dataframes in the names of the column. After that you can use these names to your do.call
output like this:
your_names = unlist(lapply(my_list, \(x) colnames(x)), use.names = FALSE)
df = do.call("cbind.data.frame", my_list)
names(df) = your_names
df
#> Pulmonary_embolism~f.20002.0 Pulmonary_embolism~f.20002.1
#> 1 NA NA
#> Pulmonary_embolism~f.20002.2 Pulmonary_embolism~f.20002.3
#> 1 NA NA
#> Pulmonary_embolism~f.20002.all Pulmonary_embolism~f.6152.0
#> 1 NA NA
#> Pulmonary_embolism~f.6152.1 Pulmonary_embolism~f.6152.2
#> 1 NA NA
#> Pulmonary_embolism~f.6152.3 Pulmonary_embolism~f.6152.all
#> 1 NA NA
Another option could be using bind_cols
from dplyr
to not have the numbers in the names like this:
library(dplyr)
bind_cols(my_list)
#> Pulmonary_embolism~f.20002.0 Pulmonary_embolism~f.20002.1
#> 1 NA NA
#> Pulmonary_embolism~f.20002.2 Pulmonary_embolism~f.20002.3
#> 1 NA NA
#> Pulmonary_embolism~f.20002.all Pulmonary_embolism~f.6152.0
#> 1 NA NA
#> Pulmonary_embolism~f.6152.1 Pulmonary_embolism~f.6152.2
#> 1 NA NA
#> Pulmonary_embolism~f.6152.3 Pulmonary_embolism~f.6152.all
#> 1 NA NA
Created on 2023-06-29 with reprex v2.0.2