pythonstringsplitbeautifulsoupurllib2

Python splitting to the newline character


I have an html file that I am just retrieving the body of text from.
I would like to print it in one single line.

Right now I print it like this:

for line in newName.body(text=True):
    print line

This gives me everything in the body that I would like is to print like:

for line in newName.body(text=True):
    print line[257:_____] # this is where i need help

Instead of ____ or choosing another number as the end, I want it to go to the newline character, so it looks like:

for line in newName.body(text=True):
    print line[257:'\n'] 

However that dosent work.
How can I make it work?

The text which I am working in is located in:

body
    pre
        The text I want
    /pre
/body

Solution

  • You could use .partition() method to get the first line:

    first_line = newName.body.getText().partition("\n")[0]
    

    assuming newName is a BeautifulSoup object. It is usually named soup.

    To get text from the first <pre> tag in the html:

    text = soup.pre.string
    

    To get a list of lines in the text:

    list_of_lines = text.splitlines()
    

    If you want to keep end of line markers in the text:

    list_of_lines = text.splitlines(True)
    

    To get i-th line from the list:

    ith_line = list_of_lines[i]
    

    note: zero-based indexing e.g., i = 2 corresponds to the 3rd line.