When you have to split a command-line, for example to call Popen
, the best practice seems to be
subprocess.Popen(shlex.split(cmd), ...
but RTFM
The
shlex
class makes it easy to write lexical analyzers for simple syntaxes resembling that of the Unix shell ...
So, what's the correct way on win32? And what about quote parsing and POSIX vs non-POSIX mode?
There is no valid command-line splitting function so far in the Python stdlib for Windows/multi-platform so far. (Mar 2016)
So in short for subprocess.Popen .call
etc. best do like:
if sys.platform == 'win32':
args = cmd
else:
args = shlex.split(cmd)
subprocess.Popen(args, ...)
On Windows the split is not necessary for either values of shell
option and internally Popen just uses subprocess.list2cmdline
to again re-join the split arguments :-) .
With option shell=True
the shlex.split
is not necessary on Unix either.
Split or not, on Windows for starting .bat
or .cmd
scripts (unlike .exe .com) you need to include the file extension explicitely - unless shell=True
.
shlex.split(cmd, posix=0)
retains backslashes in Windows paths, but it doesn't understand quoting & escaping right. Its not very clear what the posix=0 mode of shlex is good for at all - but 99% it certainly seduces Windows/cross-platform programmers ...
Windows API exposes ctypes.windll.shell32.CommandLineToArgvW
:
Parses a Unicode command line string and returns an array of pointers to the command line arguments, along with a count of such arguments, in a way that is similar to the standard C run-time argv and argc values.
def win_CommandLineToArgvW(cmd):
import ctypes
nargs = ctypes.c_int()
ctypes.windll.shell32.CommandLineToArgvW.restype = ctypes.POINTER(ctypes.c_wchar_p)
lpargs = ctypes.windll.shell32.CommandLineToArgvW(unicode(cmd), ctypes.byref(nargs))
args = [lpargs[i] for i in range(nargs.value)]
if ctypes.windll.kernel32.LocalFree(lpargs):
raise AssertionError
return args
However that function CommandLineToArgvW
is bogus - or just weakly similar to the mandatory standard C argv & argc
parsing:
>>> win_CommandLineToArgvW('aaa"bbb""" ccc')
[u'aaa"bbb"""', u'ccc']
>>> win_CommandLineToArgvW('"" aaa"bbb""" ccc')
[u'', u'aaabbb" ccc']
>>>
C:\scratch>python -c "import sys;print(sys.argv)" aaa"bbb""" ccc
['-c', 'aaabbb"', 'ccc']
C:\scratch>python -c "import sys;print(sys.argv)" "" aaa"bbb""" ccc
['-c', '', 'aaabbb"', 'ccc']
Watch http://bugs.python.org/issue1724822 for possibly future additions in the Python lib. (The mentioned function on "fisheye3" server doesn't really work correct.)
Valid Windows command-line splitting is rather crazy. E.g. try \ \\ \" \\"" \\\"aaa """"
...
My current candidate function for cross-platform command-line splitting is the following function which I consider for Python lib proposal. Its multi-platform; its ~10x faster than shlex, which does single-char stepping and streaming; and also respects pipe-related characters (unlike shlex). It stands a list of tough real-shell-tests already on Windows & Linux bash, plus the legacy posix test patterns of test_shlex
.
Interested in feedback about remaining bugs.
def cmdline_split(s, platform='this'):
"""Multi-platform variant of shlex.split() for command-line splitting.
For use with subprocess, for argv injection etc. Using fast REGEX.
platform: 'this' = auto from current platform;
1 = POSIX;
0 = Windows/CMD
(other values reserved)
"""
if platform == 'this':
platform = (sys.platform != 'win32')
if platform == 1:
RE_CMD_LEX = r'''"((?:\\["\\]|[^"])*)"|'([^']*)'|(\\.)|(&&?|\|\|?|\d?\>|[<])|([^\s'"\\&|<>]+)|(\s+)|(.)'''
elif platform == 0:
RE_CMD_LEX = r'''"((?:""|\\["\\]|[^"])*)"?()|(\\\\(?=\\*")|\\")|(&&?|\|\|?|\d?>|[<])|([^\s"&|<>]+)|(\s+)|(.)'''
else:
raise AssertionError('unkown platform %r' % platform)
args = []
accu = None # collects pieces of one arg
for qs, qss, esc, pipe, word, white, fail in re.findall(RE_CMD_LEX, s):
if word:
pass # most frequent
elif esc:
word = esc[1]
elif white or pipe:
if accu is not None:
args.append(accu)
if pipe:
args.append(pipe)
accu = None
continue
elif fail:
raise ValueError("invalid or incomplete shell string")
elif qs:
word = qs.replace('\\"', '"').replace('\\\\', '\\')
if platform == 0:
word = word.replace('""', '"')
else:
word = qss # may be even empty; must be last
accu = (accu or '') + word
if accu is not None:
args.append(accu)
return args