pythonhtml-parsingbeautifulsoup

How to change tag name with BeautifulSoup?


I am using python + BeautifulSoup to parse an HTML document.

Now I need to replace all <h2 class="someclass"> elements in an HTML document, with <h1 class="someclass">.

How can I change the tag name, without changing anything else in the document?


Solution

  • I don't know how you're accessing tag but the following works for me:

    import BeautifulSoup
    
    if __name__ == "__main__":
        data = """
    <html>
    <h2 class='someclass'>some title</h2>
    <ul>
       <li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
       <li>Aliquam tincidunt mauris eu risus.</li>
       <li>Vestibulum auctor dapibus neque.</li>
    </ul>
    </html>
    
        """
        soup = BeautifulSoup.BeautifulSoup(data)
        h2 = soup.find('h2')
        h2.name = 'h1'
        print soup
    

    Output of print soup command is:

    <html>
    <h1 class='someclass'>some title</h1>
    <ul>
    <li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
    <li>Aliquam tincidunt mauris eu risus.</li>
    <li>Vestibulum auctor dapibus neque.</li>
    </ul>
    </html>
    

    As you can see, h2 became h1. And nothing else in the document changed. I am using Python 2.6 and BeautifulSoup 3.2.0.

    If you have more than one h2 and you want to change them all, you could simple do:

    soup = BeautifulSoup.BeautifulSoup(your_data)
    while True: 
        h2 = soup.find('h2')
        if not h2:
            break
        h2.name = 'h1'