I am writing an automated code that opens a text file and reads it line by line:
if __name__ == '__main__':
#Argument Required: Full directory of log file for processing
parser = ArgumentParser()
parser.add_argument("--logDestination", dest="logDest", help="Provide the directory of the log file")
args = parser.parse_args()
#Log directory is stored in this variable
logDestination = str(args.logDest).strip()
with open(logDestination) as f:
for line in f:
print(line.strip())
The text file contains logs that look like this:
26/10/22 20:36:22:385 SCOPE: SYSTEM ID: ALL
26/10/22 20:36:22:385 ELAPSED_TIME: 61.7 s
26/10/22 20:36:22:385 EMM_PROCEDURE:
26/10/22 20:36:22:385 [Procedure] [Count] [Retry] [Success] [Failure]
26/10/22 20:36:22:385 ATTACH 0 0 0 0
26/10/22 20:36:22:385 DETACH_UE_INIT 0 0 0 0
26/10/22 20:36:22:385 DETACH_NW_INIT 0 0 0 0
26/10/22 20:36:22:385 TAU_NORMAL 0 0 0 0
26/10/22 20:36:22:385 TAU_PERIODIC 0 0 0 0
26/10/22 20:36:22:385 SERVICE_REQ_MO 0 0 0 0
26/10/22 20:36:22:385 SERVICE_REQ_MT 0 0 0 0
I would like to remove the timestamp from each line, so that I can parse the stats in the logs.
Summary: Python code to read text file line by line and remove any timestamps that are there. Additionally, I will extract the data and have and convert it into a CSV.
I was going to try to remove the first 21 characters on each line (number of characters in the timestamps) which is an easy but unforgivable method as some lines don't contain a time stamp.
I'm only just learning Python myself at the moment, so this might not be the best solution, but my first thought was a regular expression, something like this:
import re
# other code...
reTimestamp = r'\d{2}\/\d{2}\/\d{2} \d{2}:\d{2}:\d{2}:\d{3}'
with open(logDestination) as f:
for line in f:
result = re.sub(reTimestamp , '', line, 0)
if result:
print(result.strip())