pythonbashshellsubprocessamos

python subprocess.call() modifies arguments before passing it to shell-script?


I am writing a python wrapper for calling programs of the AMOS package (specifically for merging genome assemblies from different sources using good ol' minimus2 from AMOS).

The scripts should be called like this when using the shell directly:

toAmos -s myinput.fasta -o testoutput.afg
minimus2 testoutput -D REFCOUNT=400 -D OVERLAP=500

[just for clarification:

-toAmos: converts my input.fasta file to .afg format and requires an input sequence argument ("-s") and an output argument ("-o")

-minimus2: merges a sequence dataset against reference contigs and requires an argument "-D REFCOUNT=x" for stating the number of rerference seqeunces in your input and an argument "-D OVERLAP=Y" for stating the minimum overlap between sequences]

So within my script I use subprocess.call() to call the necessary AMOS tools.

Basically I do this:

from subprocess import call:
output_basename = "testoutput"
inputfile = "myinput.fasta"

call(["toAmos", "-s " + inputfile, "-o " + output_basename + ".afg"])
call(["minimus2", output_basename, "-D REFCOUNT=400", "-D OVERLAP=500"])

But in this case the AMOS tools cannot interpret the arguments anymore. The arguments seem get modified by subprocess.call() and passed incorrectly. The error message I get is:

Unknown option: s myinput.fasta
Unknown option: o testoutput.afg
You must specify an output AMOS AFG file with option -o
/home/jov14/tools/miniconda2/bin/runAmos: unrecognized option '-D REFCOUNT=400'
Command line parsing failed, use -h option for usage info

It seems that the arguments get passed without the leading "-"? So I then tried passing the command as a single string (including arguments) like this:

call(["toAmos -s " + inputfile +" -o " + output_basename + ".afg"])

But then I get this error...

OSError: [Errno 2] No such file or directory

... presumably because subprocess.call is interpreting the whole string as the name for a single script. I guess I COULD probably try shell=True as a workaround, but the internet is FULL of instructions clearly advising against this.

What seems to be the problem here? What can I do?


Solution

  • answer

    Either do:

    call("toAmos -s " + inputfile +" -o " + output_basename + ".afg") # single string
    

    or do:

    call(["toAmos", "-s", inputfile, "-o", output_basename + ".afg"]) # list of arguments
    

    discussion

    In the case of your:

    call(["toAmos", "-s " + inputfile, "-o " + output_basename + ".afg"])
    

    you should supply:

    In the case of your:

    call(["minimus2", output_basename, "-D REFCOUNT=400", "-D OVERLAP=500"])
    

    the "-D REFCOUNT=400" and "-D OVERLAP=500" should be provided as two items each ('-D', 'REFCOUNT=400', '-D', 'OVERLAP=500'), or drop the spaces ('-DREFCOUNT=400', '-DOVERLAP=500').

    additional info

    You seem to lack knowledge of how a shell splits a command-line; I suggest you always use the single string method, unless there are spaces in filenames or you have to use the shell=False option; in that case, I suggest you always supply a list of arguments.