pythonregexbigdatastreaminglogparser

Most Efficient Way to Retrieve Log Attributes in Python | Seperate by comma


Below, I have pasted the logs that we received continuously (streaming). I need to extract and parse them.

Log1 = "2024-04-03T09:51:17+0000 logType, xyz=appliance1, xyz1=HR, action=allow, applianceId=1, xyz4=2, xyz5=3, so on..."

Log2 = "2024-04-03T09:51:17+0000 logType, xyz=appliance1, xyz1=HR, action=allow, applianceId=1, xyz4=2, xyz5=3, so on..."

Log3 = "2024-04-03T09:51:17+0000 logType, xyz=appliance1, xyz1=HR, action=allow, applianceId=1, xyz4=2, xyz5=3, so on..."

Log4 = "2024-04-03T09:51:17+0000 logType, xyz=appliance1, xyz1=HR, action=allow, applianceId=1, xyz4=2, xyz5=3, so on..."

What could be the efficient way to parse log the one way that I did using split,

def parse(log):
    values = log.split(',')
    for v in values:
        //do it here

def main():
   parse(log1)
   parse(log2)
   parse(log3)
   parse(log4)

Note 1: Specific attribute values are required from each log (log1, log2...). For example, I need the values of attributes xyz, zyz2, and xyz5 from each log.

Note 2: This is just a small example, but there might be more than 20 to 30 attributes for each log.


Solution

  • If you split the string on comma you can ignore the first token. The remaining tokens are attribute/value pairs separated by equals.

    You could write your parse() function to return a dictionary where the keys are attribute names and the values are the attribute values.

    You can then process the dictionary to do your database update.

    Something like this:

    Log1 = "2024-04-03T09:51:17+0000 logType, xyz=appliance1, xyz1=HR, action=allow, applianceId=1, xyz4=2, xyz5=3"
    
    def parse(s):
        def genattrs(attrs):
            for attr in attrs:
                a, v = attr.split("=")
                yield a.lstrip(), v
     
        return dict(genattrs(s.split(",")[1:]))
    
    print(parse(Log1))
    

    Output:

    {'xyz': 'appliance1', 'xyz1': 'HR', 'action': 'allow', 'applianceId': '1', 'xyz4': '2', 'xyz5': '3'}