python-3.xparsingshlex

Prevent shlex from splitting with colon (:)


I'm having trouble dealing with colons (:) in shlex. I need the following behaviour:

Sample input

text = 'hello:world ("my name is Max")'
s = shlex.shlex(instream=text, punctuation_chars=True)
s.get_token()
s.get_token()
...

Desired output

hello:world
(
"my name is Max"
)

Current output

hello
:
world
(
"my name is Max"
)

Shlex puts the colon in a separate token and I don't want that. The documentation doesn't say very much about the colon. I've tried to add it to the wordchar attribute but it messes everything up and separates the words between commas. I've also tried setting the punctuation_char attribute to a custom array with only parenthesis: ["(", ")"] but it makes no difference. I need the punctuation_char option set to get the parenthesis as a separate token (or any other option that achieves this output).

Anyone knows how could I get this to work? Any help will be greatly appreciated. I'm using python 3.6.9, could upgrade to python 3.7.X if necessary.


Solution

  • To make shlex treat : as a word char, you need to add : to wordchars:

    >>> text = 'hello:world ("my name is Max")'
    >>> s = shlex.shlex(instream=text, punctuation_chars=True)
    >>> s.wordchars += ':'
    >>> while True:
    ...   tok = s.get_token()
    ...   if not tok: break
    ...   print(tok)
    ... 
    hello:world
    (
    "my name is Max"
    )
    

    I tested that with Python 3.6.9 and 3.8.0. I think you need Python 3.6 in order to have the punctuation_chars initialization parameter.