loggingcsplit

Split File based on date prefix?


I have this file.log

Sep 16 16:18:49 abcd 123 456
Sep 16 16:18:49 abcd 123 567
Sep 17 16:18:49 abcd 123 456
Sep 17 16:18:49 abcd 123 567

I want to split based on date partition so I get,

Sep_16.log

Sep 16 16:18:49 abcd 123 456
Sep 16 16:18:49 abcd 123 567

Sep_17.log

Sep 17 16:18:49 abcd 123 456
Sep 17 16:18:49 abcd 123 567

I search in the forum, that it's supposed to be using csplit and regex ^.{6}, but the answer that I got only for the regex to be used as delimiter, which is not what I intended.

Also, I want to split 10k rows per date partition, so the filename will be something like Sep_17_part001.log, which will then using something like prefix and suffix option.

Does anybody know the full command for doing this? And if I do this one time thing on one log, how can I make it to run daily, without csplit overwrite previous days?


Solution

  • So in the end, I decided to create a simple Python script after searching through csplit documentation and find nothing that suitable to my needs.

    Something like,

    with open(args.logfile) as f:
        for line in f:
            timef = datetime.strptime(str(datetime.utcnow().year) + line[:6], '%Y%b %d').strftime('%Y%m%d')
            t_dest_path = os.path.join(date_path, timef + '-browse.log')
            with open(t_dest_path, "a") as fdest:
                fdest.write(line)