rrgui

Opening a new instance of R and sourcing a script within that instance


Background/Motivation: I am running a bioinformatics pipeline that, if executed from beginning to end linearly takes several days to finish. Fortunately, some of the tasks don't depend upon each other so they can be performed individually. For example, Task 2, 3, and 4 all depend upon the output from Task 1, but do not need information from each other. Task 5 uses the output of 2, 3, and 4 as input.

I'm trying to write a script that will open new instances of R for each of the three tasks and run them simultaneously. Once all three are complete I can continue with the remaining pipeline.

What I've done in the past, for more linear workflows, is have one "master" script that sources (source()) each task's subscript in turn.

I've scoured SO and google and haven't been able to find a solution for this particular problem. Hopefully you guys can help.

From within R, you can run system() to invoke commands within a terminal and open to open a file. For example, the following will open a new terminal instance:

system("open -a Terminal .",wait=FALSE)

Similarly, I can start a new r session by using

system("open -a r .")

What I can't figure out for the life of me is how to set the "input" argument so that it sources one of my scripts. For example, I would expect the following to open a new terminal instance, call r within the new instance, and then source the script.

system("open -a Terminal .",wait=FALSE,input=paste0("r; source(\"/path/to/script/M_01-A.R\",verbose=TRUE,max.deparse.length=Inf)"))

Solution

  • Answering my own question in the event someone else is interested down the road.

    After a couple of days of working on this, I think the best way to carry out this workflow is to not limit myself to working just in R. Writing a bash script offers more flexibility and is probably a more direct solution. The following example was suggested to me on another website.

    #!/bin/bash
    
    # Run task 1
    Rscript Task1.R
    
    # now run the three jobs that use Task1's output
    # we can fork these using '&' to run in the background in parallel
    Rscript Task2.R &
    Rscript Task3.R &
    Rscript Task4.R &
    
    # wait until background processes have finished
    wait %1 %2 %3
    
    Rscript Task5.R