pythonlinesstring-parsingreorganize

Python. Join specific lines on 1 line


Let's say I have this file:

1
17:02,111
Problem report related to
router

2
17:05,223
Restarting the systems

3
18:02,444
Must erase hard disk
now due to compromised data

I want this output:

1
17:02,111
Problem report related to router

2
17:05,223
Restarting the systems

3
18:02,444
Must erase hard disk now due to compromised data

Been trying in bash and got to a kind of close solution but I don't know how to carry this out on Python.

Thank you in advance


Solution

  • If you want to remove the extea lines :

    For this aim you can check 2 condition for each like one if the line don't followed by an empty new line, or line should precede by a line that match with following regex ^\d{2}:\d{2},\d{3}\s$.

    So for access to next line in each iteration you can create one file object from your main file object with the name temp using itertools.tee and apply the next function on it. and use re.match to match the regex.

    from itertools import tee
    import re
    with open('ex.txt') as f,open('new.txt','w') as out:
        temp,f=tee(f)
        next(temp)
        try:
            for line in f:
                if next(temp) !='\n' or re.match(r'^\d{2}:\d{2},\d{3}\s$',pre):
                    out.write(line)
                pre=line
        except :
            pass
    

    result :

    1
    17:02,111
    Problem report related to
    
    2
    17:05,223
    Restarting the systems
    
    3
    18:02,444
    Must erase hard disk
    

    If you want to concatenate the rest to third line :

    And if you want to concatenate the rest lines after third line to third line you can use following regex to find all blocks that followed by \n\n or the end of file ($) :

    r"(.*?)(?=\n\n|$)"
    

    then split your blocks based on the line that in in a date format and write the parts to your output file, but note that you need to replace the new lines within 3rd part with space :

    ex.txt:

    1
    17:02,111
    Problem report related to
    router
    another line
    
    
    2
    17:05,223
    Restarting the systems
    
    3
    18:02,444
    Must erase hard disk
    now due to compromised data
    line 5
    line 6
    line 7
    

    Demo :

    def splitter(s):
        for x in re.finditer(r"(.*?)(?=\n\n|$)", s,re.DOTALL):
              g=x.group(0)
              if g:
                yield g
    
    import re
    with open('ex.txt') as f,open('new.txt','w') as out:
        for block in splitter(f.read()):
            first,second,third= re.split(r'(\d{2}:\d{2},\d{3}\n)',block)
            out.write(first+second+third.replace('\n',' '))
    

    result :

    1
    17:02,111
    Problem report related to router another line
    2
    17:05,223
    Restarting the systems
    3
    18:02,444
    Must erase hard disk now due to compromised data line 5 line 6 line 7
    

    Note :

    In this answer the splitter function returns a generator that is very efficient when you are dealing with huge files and refuse of storing unusable lines in memory.