pythontextstrip

How can i remove only single lines between text to chunk it up?


I have a text file:

title

header
topic one two three


hello harry

i want to remove only single lines between text to get:

title
header
topic one two three


hello harry

how i can do this using python?

data = open('data.txt').read().replace('\n', '')

the above removes all


Solution

  • Use a regular expression to match all instances of \n\n exactly and replace them with single \n. You must match \n\n because each line in your example input file will end in \n (so a blank line between paragraphs is \n\n).

    data = open('data.txt').read()
    pattern = r'(?<!\n)\n\n(?!\n)'
    re.sub(pattern, '\n', data)
    

    The first part (?<!\n) checks that the preceding character is not a newline. The middle \n\n checks for a double newline. The end part (?!\n) checks that the following character is not a newline. So this regex solution is generalized and will match all instances of \n\n without touching \n or \n\n\n etc.