pythonregexcisco

regex and exclude specific text from the capturing group


I have following ACLs from cisco

access-list office extended permit tcp host 1.1.1.1 host 2.2.2.2
access-list home extended permit object-group PROTOS4 host 4.4.4.4 host 5.5.5.5

I'm trying to write a parser and i have following code in python

acl_general_structure = (
    r'access-list\s+(?P<policy_name>[A-Za-z0-9\-\_]+)\s+extended\s+(?P<action>permit|deny)'
    r'\s'
    r'(?P<protocol>[a-zA-Z0-9]+|(?:object-group\s[A-Za-z\d]+))'
    r'\s'
    r'host\s(?P<source>(?:[0-9]{1,3}\.){3}[0-9]{1,3})'
    r'\s'
    r'host\s(?P<destination>(?:[0-9]{1,3}\.){3}[0-9]{1,3})'
)

f_in_name="xx.config"
f_out_name=f_in_name + ".csv"

with open(f_in_name, "r", encoding="utf8") as f:
    for line in f.readlines():
        result=re.match(acl_general_structure,line)
        if result:
            print(result.groupdict())

with current code the output is:

{'policy_name': 'office', 'action': 'permit', 'protocol': 'tcp', 'source': '1.1.1.1', 'destination': '2.2.2.2'}
{'policy_name': 'home', 'action': 'permit', 'protocol': 'object-group PROTOS4', 'source': '4.4.4.4', 'destination': '5.5.5.5'}

What i want to achive is

{'policy_name': 'office', 'action': 'permit', 'protocol': 'tcp', 'source': '1.1.1.1', 'destination': '2.2.2.2'}
{'policy_name': 'home', 'action': 'permit', 'protocol': 'PROTOS4', 'source': '4.4.4.4', 'destination': '5.5.5.5'}

Meaning that "object-group" string is removed from capturing group. Is this actually possible or i need to digest this separatly trhough python split while working on disconary{'proptocol'} value? I know how to process strings in python, but wanted to handle this on regex level.


Solution

  • change

    r'(?P<protocol>[a-zA-Z0-9]+|(?:object-group\s[A-Za-z\d]+))'
    

    to

    r'(?:object-group\s)?(?P<protocol>[a-zA-Z0-9]+)'