Please consider the snippet at the end of the post. I would like to be able to save (possibly as an RDS) the results of the computations while they progress (e.g. every time a new 10% of the list is processed). How can I do that?
library(tidyverse)
ll <- 1:1000
res <- map(ll, \(x) cos(x))
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Debian GNU/Linux 12 (bookworm)
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.11.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0
#>
#> locale:
#> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
#> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
#> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Europe/Brussels
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
#> [5] purrr_1.0.2 readr_2.1.5 tidyr_1.3.1 tibble_3.2.1
#> [9] ggplot2_3.5.1 tidyverse_2.0.0
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.5 compiler_4.4.1 reprex_2.1.0 tidyselect_1.2.1
#> [5] scales_1.3.0 yaml_2.3.8 fastmap_1.1.1 R6_2.5.1
#> [9] generics_0.1.3 knitr_1.46 munsell_0.5.1 R.cache_0.16.0
#> [13] tzdb_0.4.0 pillar_1.9.0 R.utils_2.12.3 rlang_1.1.3
#> [17] utf8_1.2.4 stringi_1.8.4 xfun_0.43 fs_1.6.4
#> [21] timechange_0.3.0 cli_3.6.2 withr_3.0.0 magrittr_2.0.3
#> [25] digest_0.6.35 grid_4.4.1 hms_1.1.3 lifecycle_1.0.4
#> [29] R.methodsS3_1.8.2 R.oo_1.26.0 vctrs_0.6.5 evaluate_0.23
#> [33] glue_1.7.0 styler_1.10.3 fansi_1.0.6 colorspace_2.1-0
#> [37] rmarkdown_2.26 tools_4.4.1 pkgconfig_2.0.3 htmltools_0.5.8.1
Created on 2024-06-27 with reprex v2.1.0
Turns out there's a package for that, currr
("checkpoint" + purrr
). It doesn't save precisely in the form you specified (but see below for how to access intermediate results), but these functions (cp_map()
for example)
create a secret folder in your current working directory and save the results if they reach a given checkpoint. This way if you rerun the code, it reads the result from the cache folder and starts to evaluate where you finished. [slightly edited from original]
cp_map()
has a cp_option=
argument that allows you to specify how often to checkpoint (i.e., how many checkpoints per job) and where to store the results.
library(currr)
options(currr.n_checkpoint = 10, currr.folder = "checkpoints")
cc <- cp_map(1:1000, name = "cos_results", cos)
list.files("checkpoints/cos_results")
If you want to look at these intermediate outputs directly (rather than using them via the package as an automated checkpointing system) you'll have to figure out what these files are: it looks like the out*
files are storing chunks of output (e.g. out_301.rds
has the results for cos(301:400)
).
[1] "et_1.rds" "et_101.rds" "et_201.rds" "et_301.rds" "et_401.rds"
[6] "et_501.rds" "et_601.rds" "et_701.rds" "et_801.rds" "et_901.rds"
[11] "f.rds" "id_1.rds" "id_101.rds" "id_201.rds" "id_301.rds"
[16] "id_401.rds" "id_501.rds" "id_601.rds" "id_701.rds" "id_801.rds"
[21] "id_901.rds" "meta.rds" "out_1.rds" "out_101.rds" "out_201.rds"
[26] "out_301.rds" "out_401.rds" "out_501.rds" "out_601.rds" "out_701.rds"
[31] "out_801.rds" "out_901.rds" "st_1.rds" "st_101.rds" "st_201.rds"
[36] "st_301.rds" "st_401.rds" "st_501.rds" "st_601.rds" "st_701.rds"
[41] "st_801.rds" "st_901.rds" "x.rds"