pythonrcondaexternalreticulate

Run external program inside a conda environment in R


I am trying to run stitchr in R. For programs that run in Python, I use reticulate. I create a conda environment named r-reticulate, where I want to install stitchr and run it.

I try the following:

if (!('r-reticulate' %in% reticulate::conda_list()[,1])){
  reticulate::conda_create(envname = 'r-reticulate', packages = 'python=3.10')
}
reticulate::use_condaenv('r-reticulate')
reticulate::py_install("stitchr", pip = TRUE)

system("stitchr -h") # this does not work

But obviously enough, the system() call does not work, with the message error in running command.

What would be the right way to do this?

I had success in the past with anndata, for example. But this is an R package wrapper, so I can just do:

reticulate::use_condaenv('r-reticulate')
reticulate::py_install("anndata", pip = TRUE)

data_h5ad <- anndata::read_h5ad("file.h5ad")

How can I approach the stitchr case?

EDIT:

So I retrieved stitchr.py location during the package installation: /usr/local/Caskroom/miniconda/base/envs/r-reticulate/lib/python3.10/site-packages/Stitchr/stitchr.py

I tried all the following but nothing works (see error messages):

pyloc="/usr/local/Caskroom/miniconda/base/envs/r-reticulate/lib/python3.10/site-packages/Stitchr/stitchr.py"
reticulate::source_python(pyloc)

Error in py_run_file_impl(file, local, convert) : ImportError: attempted relative import with no known parent package Run reticulate::py_last_error() for details.

reticulate::py_run_file(pyloc)

Error in py_run_file_impl(file, local, convert) : ImportError: attempted relative import with no known parent package Run reticulate::py_last_error() for details.

reticulate::py_run_string(paste(pyloc, "-h"))

Error in py_run_string_impl(code, local, convert) : File "", line 1 /usr/local/Caskroom/miniconda/base/envs/r-reticulate/lib/python3.10/site-packages/Stitchr/stitchr.py -h SyntaxError: invalid syntax Run reticulate::py_last_error() for details.

I am absolutely clueless on how to proceed here.


Solution

  • Here is maybe what you expect.

    shell:

    conda create --name=testenv python
    # or conda create --name=testenv python==3.10.13 if you want a specific version for jupyter for example
    conda activate testenv
    # to be sure which pip is:
    whereis pip
    

    ~/anaconda3/envs/testenv/bin/pip

    shell stitchr part, read from the doc of stitchr

    pip install stitchr IMGTgeneDL
    
    stitchrdl
    stitchr -v TRBV7-3*01 -j TRBJ1-1*01 -cdr3 CASSYLQAQYTEAFF
    

    It works with command line.

    shell

    cd ~
    cp /home/extraits/anaconda3/envs/testenv/bin/stitchr ~/teststitchr.py
    ./teststitchr.py -v TRBV7-3*01 -j TRBJ1-1*01 -cdr3 CASSYLQAQYTEAFF
    

    It works with command line.

    Create ~/teststitchr2.py filled by the content of https://jamieheather.github.io/stitchr/importing.html

    ~/teststitchr2.py:

    # import stitchr
    from Stitchr import stitchrfunctions as fxn
    from Stitchr import stitchr as st
    
    # specify details about the locus to be stitched
    chain = 'TRB'
    species = 'HUMAN'
    
    # initialise the necessary data
    tcr_dat, functionality, partial = fxn.get_imgt_data(chain, st.gene_types, species)
    codons = fxn.get_optimal_codons('', species)
    
    # provide details of the rearrangement to be stitched
    tcr_bits = {'v': 'TRBV7-3*01', 'j': 'TRBJ1-1*01', 'cdr3': 'CASSYLQAQYTEAFF',
                'l': 'TRBV7-3*01', 'c': 'TRBC1*01',
                'skip_c_checks': False, 'species': species, 'seamless': False,
                '5_prime_seq': '', '3_prime_seq': '', 'name': 'TCR'}
    
    # then run stitchr on that rearrangement
    stitched = st.stitch(tcr_bits, tcr_dat, functionality, partial, codons, 3, '')
    
    print(stitched)
    # Which produces
    (['TCR', 'TRBV7-3*01', 'TRBJ1-1*01', 'TRBC1*01', 'CASSYLQAQYTEAFF', 'TRBV7-3*01(L)'],
     'ATGGG snip snip snip snip snip snip TTC',
     0)
    

    python in the shell

    python ./teststitchr2.py
    

    (['TCR', 'TRBV7-301', 'TRBJ1-101', 'TRBC101','CASSYLQAQYTEAFF','TRBV7-301(L)'],'ATG snip snip snip snip TTC', 0)

    In R:

    library(reticulate)
    reticulate::use_condaenv('testenv')
    py_run_file(file.path(path.expand('~'),'teststitchr2.py'))
    names(py)
    

    reticulate::py_run_file() populates the variable py: https://rstudio.github.io/reticulate/articles/calling_python.html#executing-code

    Here is, by names(py), all functions and variables from reticulate prefixed by py$

    c("chain", "codons", "functionality", "fxn", "partial", "r", "species", "st", "stitched", "tcr_bits", "tcr_dat")

    In R:

    print(py$stitched )
    

    It works :)

    [[1]]
    [1] "TCR"             "TRBV7-3*01"      "TRBJ1-1*01"      "TRBC1*01"       
    [5] "CASSYLQAQYTEAFF" "TRBV7-3*01(L)"  
    
    [[2]]
    [1] "ATGGGCAC snip snip snip snip "
    
    [[3]]
    [1] 0
    

    You can type myvar=py$stitched to have it in a variable and use it later.

    You can also try this: In R:

    tcr_bits2= list(v = "TRBV7-3*01", j = "TRBJ1-1*01", cdr3 = "CASSYLQAQYTEAFF", 
        l = "TRBV7-3*01", c = "TRBC1*01", skip_c_checks = FALSE, 
        species = "HUMAN", seamless = FALSE, `5_prime_seq` = "", 
        `3_prime_seq` = "", name = "TCR")
    
    py$st$stitch(tcr_bits2, py$tcr_dat,py$functionality, py$partial, py$codons, 3, '')
    
    • 'TCR''TRBV7-301''TRBJ1-101''TRBC101''CASSYLQAQYTEAFF''TRBV7-301(L)'
    • 'ATG snip snip snip snip ATTTC'
    • 0

    Be careful I mixed R variable, tcr_bits2, and reticulate environment (py$). You can type myvar2=py$st$stitch(bla bla) to have it in a variable and use it later.

    It works again :)

    Edit:

    And a bad trick, in the Python side, if you have an issue of import, before from Stitchr import

    import os
    os.chdir(os.path.join(os.path.expanduser('~'), 'anaconda3/envs/testenv/lib/python3.12/site-packages'))
    

    But look at also How can I import a module dynamically given the full path?

    This trick (os.chdir()) is only for test, but try to not use it.