Separate out all the timestamps from the other content present in the text file. For example:
a.txt
2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart
2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart
2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart
2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z" "mgremove datestring" asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z" "mgremove datestring" asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z" "mgremove datestring" asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z" "mgremove datestring" asfasnfs: remove datepart
17 Jul 2019 07:01:10 "mgremove datestring" asfasnfs: remove datepart
17 Jul 2019 07:01:10 "mgremove datestring" asfasnfs: remove datepart
17 Jul 2019 07:01:10 "mgremove datestring" asfasnfs: remove datepart
"mgremove datestring" asfasnfs: remove datepart check the value
"mgremove datestring" asfasnfs: remove datepart check the value
My solution does it for first 4 lines in the text but it is not generic. I want to make it generic such that it detects the timestamps automatically from the start of the line.
with open("\a.txt") as f:
for line in f:
date_string = " ".join(line.strip().split()[:4])
print(date_sting, line)
Expected solution:
date_string = 2019/01/31-11:56:23.288258 line = 2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart
date_string = 2019/01/31-11:56:23.288258 line = 2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart
date_string = 2019/01/31-11:56:23.288258 line = 2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart
date_string = 2019/01/31-11:56:23.288258 line = 2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart
date_string = "2019-07-17T07:11:14.894Z" line = "2019-07-17T07:11:14.894Z" "mgremove datestring" asfasnfs: remove datepart
date_string = "2019-07-17T07:11:14.894Z" line = "2019-07-17T07:11:14.894Z" "mgremove datestring" asfasnfs: remove datepart
date_string = "2019-07-17T07:11:14.894Z" line = "2019-07-17T07:11:14.894Z" "mgremove datestring" asfasnfs: remove datepart
date_string = "2019-07-17T07:11:14.894Z" line = "2019-07-17T07:11:14.894Z" "mgremove datestring" asfasnfs: remove datepart
date_string = 17 Jul 2019 07:01:10 line = 17 Jul 2019 07:01:10 "mgremove datestring" asfasnfs: remove datepart
date_string = 17 Jul 2019 07:01:10 line = 17 Jul 2019 07:01:10 "mgremove datestring" asfasnfs: remove datepart
date_string = 17 Jul 2019 07:01:10 line = 17 Jul 2019 07:01:10 "mgremove datestring" asfasnfs: remove datepart
date_string = 17 Jul 2019 07:01:10 line = asfasnfs: remove datepart
date_string = 17 Jul 2019 07:01:10 line = asfasnfs: remove datepart
Text file might include other timestamps pattern as well. Is there any way to detect the timestamp in the start of the line and fetch it? And if there is not date present in the start of the line then take the date from last line.
With contents of the a.txt
:
2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart
2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart
2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart
2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z" "mgremove datestring" asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z" "mgremove datestring" asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z" "mgremove datestring" asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z" "mgremove datestring" asfasnfs: remove datepart
17 Jul 2019 07:01:10 "mgremove datestring" asfasnfs: remove datepart
17 Jul 2019 07:01:10 "mgremove datestring" asfasnfs: remove datepart
17 Jul 2019 07:01:10 "mgremove datestring" asfasnfs: remove datepart
asfasnfs: remove datepart
asfasnfs: remove datepart
This script:
def get_date_string(line):
rv = ''
words = line.split()
while words:
rv += words.pop(0) + ' '
if len(rv) > 18:
break
return rv.strip()
with open('file.txt', 'r') as f_in:
last_date_string = ''
for line in f_in:
line = line.strip()
if not line:
continue
date_part = get_date_string(line)
if date_part == line:
print('date string={: <30} line={}'.format(last_date_string, line))
else:
print('date string={: <30} line={}'.format(date_part, line))
last_date_string = date_part
Prints:
date string=2019/01/31-11:56:23.288258 line=2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart
date string=2019/01/31-11:56:23.288258 line=2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart
date string=2019/01/31-11:56:23.288258 line=2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart
date string=2019/01/31-11:56:23.288258 line=2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart
date string="2019-07-17T07:11:14.894Z" line="2019-07-17T07:11:14.894Z" "mgremove datestring" asfasnfs: remove datepart
date string="2019-07-17T07:11:14.894Z" line="2019-07-17T07:11:14.894Z" "mgremove datestring" asfasnfs: remove datepart
date string="2019-07-17T07:11:14.894Z" line="2019-07-17T07:11:14.894Z" "mgremove datestring" asfasnfs: remove datepart
date string="2019-07-17T07:11:14.894Z" line="2019-07-17T07:11:14.894Z" "mgremove datestring" asfasnfs: remove datepart
date string=17 Jul 2019 07:01:10 line=17 Jul 2019 07:01:10 "mgremove datestring" asfasnfs: remove datepart
date string=17 Jul 2019 07:01:10 line=17 Jul 2019 07:01:10 "mgremove datestring" asfasnfs: remove datepart
date string=17 Jul 2019 07:01:10 line=17 Jul 2019 07:01:10 "mgremove datestring" asfasnfs: remove datepart
date string=17 Jul 2019 07:01:10 line=asfasnfs: remove datepart
date string=17 Jul 2019 07:01:10 line=asfasnfs: remove datepart