python-3.xregextimestamp

Fetch the timestamp using regex- python3.x


Separate out all the timestamps from the other content present in the text file. For example:

a.txt

2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart

"2019-07-17T07:11:14.894Z" "mgremove datestring"    asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z"     "mgremove datestring"     asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z"     "mgremove datestring"     asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z"      "mgremove datestring"     asfasnfs: remove datepart

17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
"mgremove datestring"     asfasnfs: remove datepart check the value
                         "mgremove datestring"     asfasnfs: remove datepart check the value

My solution does it for first 4 lines in the text but it is not generic. I want to make it generic such that it detects the timestamps automatically from the start of the line.

with open("\a.txt") as f:
    for line in f:
        date_string = " ".join(line.strip().split()[:4])
        print(date_sting, line)

Expected solution:

date_string = 2019/01/31-11:56:23.288258 line = 2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
date_string = 2019/01/31-11:56:23.288258 line = 2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
date_string = 2019/01/31-11:56:23.288258 line = 2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
date_string = 2019/01/31-11:56:23.288258 line = 2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
date_string = "2019-07-17T07:11:14.894Z" line = "2019-07-17T07:11:14.894Z"      "mgremove datestring"     asfasnfs: remove datepart
date_string = "2019-07-17T07:11:14.894Z" line = "2019-07-17T07:11:14.894Z"      "mgremove datestring"     asfasnfs: remove datepart
date_string = "2019-07-17T07:11:14.894Z" line = "2019-07-17T07:11:14.894Z"      "mgremove datestring"     asfasnfs: remove datepart
date_string = "2019-07-17T07:11:14.894Z" line = "2019-07-17T07:11:14.894Z"      "mgremove datestring"     asfasnfs: remove datepart
date_string = 17 Jul 2019 07:01:10 line = 17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
date_string = 17 Jul 2019 07:01:10 line = 17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
date_string = 17 Jul 2019 07:01:10 line = 17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
date_string = 17 Jul 2019 07:01:10 line =  asfasnfs: remove datepart
date_string = 17 Jul 2019 07:01:10 line =  asfasnfs: remove datepart

Text file might include other timestamps pattern as well. Is there any way to detect the timestamp in the start of the line and fetch it? And if there is not date present in the start of the line then take the date from last line.


Solution

  • With contents of the a.txt:

    2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
    2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
    2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
    2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
    
    "2019-07-17T07:11:14.894Z" "mgremove datestring"    asfasnfs: remove datepart
    "2019-07-17T07:11:14.894Z"     "mgremove datestring"     asfasnfs: remove datepart
    "2019-07-17T07:11:14.894Z"     "mgremove datestring"     asfasnfs: remove datepart
    "2019-07-17T07:11:14.894Z"      "mgremove datestring"     asfasnfs: remove datepart
    
    17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
    17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
    17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
    asfasnfs: remove datepart
                                   asfasnfs: remove datepart
    

    This script:

    def get_date_string(line):
        rv = ''
        words = line.split()
        while words:
            rv += words.pop(0) + ' '
            if len(rv) > 18:
                break
        return rv.strip()
    
    with open('file.txt', 'r') as f_in:
        last_date_string = ''
    
        for line in f_in:
            line = line.strip()
            if not line:
                continue
    
            date_part = get_date_string(line)
            if date_part == line:
                print('date string={: <30} line={}'.format(last_date_string, line))
            else:
                print('date string={: <30} line={}'.format(date_part, line))
                last_date_string = date_part
    

    Prints:

    date string=2019/01/31-11:56:23.288258     line=2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
    date string=2019/01/31-11:56:23.288258     line=2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
    date string=2019/01/31-11:56:23.288258     line=2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
    date string=2019/01/31-11:56:23.288258     line=2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
    date string="2019-07-17T07:11:14.894Z"     line="2019-07-17T07:11:14.894Z" "mgremove datestring"    asfasnfs: remove datepart
    date string="2019-07-17T07:11:14.894Z"     line="2019-07-17T07:11:14.894Z"     "mgremove datestring"     asfasnfs: remove datepart
    date string="2019-07-17T07:11:14.894Z"     line="2019-07-17T07:11:14.894Z"     "mgremove datestring"     asfasnfs: remove datepart
    date string="2019-07-17T07:11:14.894Z"     line="2019-07-17T07:11:14.894Z"      "mgremove datestring"     asfasnfs: remove datepart
    date string=17 Jul 2019 07:01:10           line=17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
    date string=17 Jul 2019 07:01:10           line=17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
    date string=17 Jul 2019 07:01:10           line=17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
    date string=17 Jul 2019 07:01:10           line=asfasnfs: remove datepart
    date string=17 Jul 2019 07:01:10           line=asfasnfs: remove datepart