python regex bigdata streaming logparser

Most Efficient Way to Retrieve Log Attributes in Python | Seperate by comma

Below, I have pasted the logs that we received continuously (streaming). I need to extract and parse them.

Log1 = "2024-04-03T09:51:17+0000 logType, xyz=appliance1, xyz1=HR, action=allow, applianceId=1, xyz4=2, xyz5=3, so on..."

Log2 = "2024-04-03T09:51:17+0000 logType, xyz=appliance1, xyz1=HR, action=allow, applianceId=1, xyz4=2, xyz5=3, so on..."

Log3 = "2024-04-03T09:51:17+0000 logType, xyz=appliance1, xyz1=HR, action=allow, applianceId=1, xyz4=2, xyz5=3, so on..."

Log4 = "2024-04-03T09:51:17+0000 logType, xyz=appliance1, xyz1=HR, action=allow, applianceId=1, xyz4=2, xyz5=3, so on..."

What could be the efficient way to parse log the one way that I did using split,

def parse(log):
    values = log.split(',')
    for v in values:
        //do it here

def main():
   parse(log1)
   parse(log2)
   parse(log3)
   parse(log4)

Note 1: Specific attribute values are required from each log (log1, log2...). For example, I need the values of attributes xyz, zyz2, and xyz5 from each log.

Note 2: This is just a small example, but there might be more than 20 to 30 attributes for each log.

Solution

If you split the string on comma you can ignore the first token. The remaining tokens are attribute/value pairs separated by equals.

You could write your parse() function to return a dictionary where the keys are attribute names and the values are the attribute values.

You can then process the dictionary to do your database update.

Something like this:

Log1 = "2024-04-03T09:51:17+0000 logType, xyz=appliance1, xyz1=HR, action=allow, applianceId=1, xyz4=2, xyz5=3"

def parse(s):
    def genattrs(attrs):
        for attr in attrs:
            a, v = attr.split("=")
            yield a.lstrip(), v
 
    return dict(genattrs(s.split(",")[1:]))

print(parse(Log1))

Output:

{'xyz': 'appliance1', 'xyz1': 'HR', 'action': 'allow', 'applianceId': '1', 'xyz4': '2', 'xyz5': '3'}