pythonregexpython-regex

Split every occurrence of Key=Value pairs in a string where the value include one or more spaces


I have a situation where user can enter commands with optional key value pairs and value may contain spaces ..

here are 4 - different form user input where key and value are separated with = sign and values have space:

"cmd=create-folder    name=SelfServe - Test ride"

"cmd=create-folder    name=SelfServe - Test ride server=prd"

"cmd=create-folder  name=cert - Test ride   server=dev site=Service"

"cmd=create-folder   name=cert - Test ride   server=dev site=Service permission=locked"

Requirement: I am trying to parse this string and split into a dictionary based on the key and value present on a string .

If user enter First form of Statement, that wold produce a dictionary like :

query_dict = {

'cmd' : 'create-folder',
'name' : 'selfserve - Test ride'
}

if user enter second form of statement that would produce /add the additional key /value pair

query_dict = {

'cmd' : 'create-folder',
'name' : 'selfserve - Test ride',
'server' : 'prd'

}

if user enter third form of statement that would produce

query_dict ={

'cmd' : 'create-folder',
'name' : 'cert - Test ride',
'server' : 'dev',
'site': 'Service'
}

forth form produce the dictionary with key/value split like below

query_dict ={

'cmd' : 'create-folder',
 'name' : 'cert - Test ride',
'server' : 'dev',
 'site': 'Service',
 'permission' : 'locked' }

-idea is to parse a string where key and value are separated with = symbol and where the values can have one or more space and extract the matching key /value pair .

I tried multiple methods to match but unable to figure out a single generic regular expression pattern which can match/extract any string where we have this kind of pattern

Appreciate your help.

i tried several pattern map based different possible user input but that is not a scalable approach . example :

i created three pattern to match three variety of user input but it would be nice if i can have one generic pattern that can match any combination of key=values in a string (i am hard coding the key in the pattern which is not ideal

'(cmd=create-folder).*(name=.*).*' ,
    '(cmd=create-pfolder).*(name=.*).*(server=.*).*',
    '(cmd=create-pfolder).*(name=.*).*(server=.*).*(site=.*)'

Solution

  • I would suggest using split, and then zip to feed the dict constructor:

    def get_dict(s):
        parts = re.split(r"\s*(\w+)=", s)
        return dict(zip(parts[1::2], parts[2::2]))
    

    Example runs:

    print(get_dict("cmd=create-folder    name=SelfServe - Test ride"))
    print(get_dict("cmd=create-folder    name=SelfServe - Test ride server=prd"))
    print(get_dict("cmd=create-folder  name=cert - Test ride   server=dev site=Service"))
    print(get_dict("cmd=create-folder   name=cert - Test ride   server=dev site=Service permission=locked"))
    

    Outputs:

    {'cmd': 'create-folder', 'name': 'SelfServe - Test ride'}
    {'cmd': 'create-folder', 'name': 'SelfServe - Test ride', 'server': 'prd'}
    {'cmd': 'create-folder', 'name': 'cert - Test ride', 'server': 'dev', 'site': 'Service'}
    {'cmd': 'create-folder', 'name': 'cert - Test ride', 'server': 'dev', 'site': 'Service', 'permission': 'locked'}
    

    Explanation

    Using this input as example:

    "cmd=create-folder    name=SelfServe - Test ride"
    

    The split regex identifies these parts:

    "cmd=create-folder    name=SelfServe - Test ride"
     ^^^^             ^^^^^^^^^
    

    The strings that are not matched by it will end up a results, so we have these:

     "", "create-folder", "SelfServe - Test ride"
    

    The first string is empty, because it is what precedes the first match.

    Now, as the regex has a capture group, the string that is captured by that group, is also returned in the result list, at odd indices. So parts ends up like this:

     ["", "cmd", "create-folder", "name", "SelfServe - Test ride"]
    

    The keys we are interested in, occur at odd indices. We can get those with parts[1::2], where 1 is the starting index, and 2 is the step.

    The corresponding values for those keys occur at even indices, ignoring the empty string at index 0. So we get those with parts[2::2]. With the call to zip, we pair those keys and values together as we want them.

    Finally, the dict constructor can take an argument with key/value pairs, which is exactly what that zip call provides.