pythondataframedata-analysislogfile

I want to convert .log file to .csv file using python


I have given an alert-handler-error.log file to make analytics on error occurrence. Initially, I have to convert the .log file to CSV or excel file to open it in a data visualization tool.

{"level":"ERROR","timestamp":"2023-02-03T19:38:10.141Z","logger":"kafkajs","message":"[Connection] Response Heartbeat(key: 12, version: 3)","broker":"64.227.156.112:9092","clientId":"chiefnet-client","error":"The group is rebalancing, so a rejoin is needed","correlationId":51,"size":10}
{"level":"ERROR","timestamp":"2023-02-03T19:38:10.145Z","logger":"kafkajs","message":"[Connection] Response Heartbeat(key: 12, version: 3)","broker":"64.227.156.112:9092","clientId":"chiefnet-client","error":"The group is rebalancing, so a rejoin is needed","correlationId":52,"size":10}
{"level":"ERROR","timestamp":"2023-02-03T19:38:10.147Z","logger":"kafkajs","message":"[Connection] Response Heartbeat(key: 12, version: 3)","broker":"64.227.156.112:9092","clientId":"chiefnet-client","error":"The group is rebalancing, so a rejoin is needed","correlationId":53,"size":10}

I tried to read the lines and convert them using the below code

import pandas as pd 
df = pd.read_csv('C:/Users/admin/alert-log/alerthandlerlog/CN-prod-alert-handler-error.log',sep='\s\s+',engine = 'python')
df.to_csv('my_file.csv',index = None)

How can I achieve this?


Solution

  • I'll assume you want a dataframe with each unique key as a header item, here you go:

    Imports:

    import re
    import json
    import pandas as pd
    

    Get the JSON objects:

    If you already have them in a list, skip this part

    txt = ""
    with open('alert-handler-error.log', 'r') as file:
        txt = file.read()
    list_of_jsons = []
    for match in re.findall(r"(\{.+\})", txt):
      j = json.loads(match)
      list_of_jsons.append(j)
    

    Use your list of JSON objects to init your dataframe

    df = pd.DataFrame.from_records(jsons)
    df.to_csv("my_file.csv",index = None)
    

    Result:

    result