web-scrapingbeautifulsouppython-requestspython-requests-htmlpython-beautifultable

difference between strings and striped_strings in beautifulsoup


What is difference between strings and stripped_strings in BeautifulSoup

import requests
from bs4 import BeautifulSoup
url = "https://codewithharry.com"
r = requests.get(url)
htmlcontent = r.content
soup = BeautifulSoup(htmlcontent, 'html.parser')

tags = soup.find(id="imgpreview2")

# Using strings method
for item in tags.strings:
    print(item)

# Using stripped_strings method
for item in tags.stripped_strings:
    print(item)

Solution

  • Stripped Strings: it omits lines that consist of just empty spaces, and also removes leading and trailing spaces.

    Strings: it does not omit lines with spaces or leading trailing space and contain \n etc.

    Example:

    from bs4 import BeautifulSoup
    
    html = """
    <html>
        <body>
            <div>
                <p>   This is some text.   </p>
                <ul>
                    <li>Item 1</li>
                    <li>Item 2</li>
                </ul>
            </div>
        </body>
    </html>
    """
    
    # Parse the HTML
    soup = BeautifulSoup(html, 'html.parser')
    
    div_tag = soup.find('div')
    
    print("strings method:")
    for item in div_tag.strings:
        print(repr(item))
    
    print("\nUsing stripped_strings method:")
    for item in div_tag.stripped_strings:
        print(repr(item))
    

    Output:

    strings method:
    '\n'
    '   This is some text.   '
    '\n'
    '\n'
    'Item 1'
    '\n'
    'Item 2'
    '\n'
    '\n'
    
    Using stripped_strings method:
    'This is some text.'
    'Item 1'
    'Item 2'