python-3.xlogfile

Removing the first date and timestamp in each line of a log file using Python


I have a series of log files in text file format. The document format is this:

[2021-12-11T10:21:30.370Z]  Branch indexing
[2021-12-11T10:21:30.374Z]  Starting the program with default pipeID
[2021-12-11T10:21:30.374Z]  Running with durable level: max_survivbility will make this program crash if left running for 20 minutes
[2021-12-11T10:21:30.374Z]  Starting the program with default pipeID

Each line in the document starts with:[2021-12-11T10:21:30.370Z]

I want to remove the first set of characters that represent date and timestamp and have a result something like this:

Branch indexing
Starting the program with default pipeID
Running with durable level: max_survivbility will make this program crash if left running for 20 minutes
Starting the program with default pipeID

Can anyone please help me explain how I can do this?

I tried to use this method but it doesn't work since I have '[]' in the date stamp.

import re
text = "[2021-12-11T10:21:30.370Z]  Branch indexing"
re.sub("[.*?]", "", text)

This doesn't work for me.

If I try the same method on a text like text = "<2021-12-11T10:21:30.370Z> Branch indexing".

import re
text = "<2021-12-11T10:21:30.370Z>  Branch indexing"
re.sub("<.*?>", "", text)

It removes <2021-12-11T10:21:30.370Z>. Why does this not work with [2021-12-11T10:21:30.370Z]?

I need help removing every instance of this format "[2021-12-11T10:21:30.370Z]" in all the log files.

Thank you so much.


Solution

  • I'd rather go with a simple solution for this case, pal. Split the string where the ] ends, then trim the second element of the resulting list, to remove all those extra spaces and then print it, bud. Hope this helps, cheers!

    import re
    text = "[2021-12-11T10:21:30.370Z]  Branch indexing"
    print(re.split("]", text)[1].strip())