pythonbeautifulsouphugohugo-shortcode

Search and replace HTML tag by class name and replace with non-HTML tag


I want to replace all div tags with the class name "figure"

<div class="figure">
    <p>Some content.</p>
</div>

with a non-HTML tag (in my case it's a Hugo shortcode)

{{% row %}}
    <p>Some content.</p>
{{% /row %}}

It's easy to replace html tags with other html tags, but I have no idea how to do it if there are non-html tags involved.


Solution

  • I cannot see "easy" solution, because the shortcodes can contain /, <, > characters as well, so you cannot have them as part of the document tree.

    One solution is to replace the <div class="figure"> with custom tag and at the final replace these custom tags with your shortcodes:

    from bs4 import BeautifulSoup
    
    txt = '''
    <div>
        <div class="figure">
            <p>Some content.</p>
        </div>
    </div>
    
    <div class="figure">
        <p>Some other content.</p>
    </div>
    '''
    
    soup = BeautifulSoup(txt, 'html.parser')
    
    for div in soup.select('div.figure'):
        t = soup.new_tag('xxx-row')
        t.contents = div.contents
        div.replace_with(t)
    
    s = str(soup).replace('<xxx-row>', '{{% row %}}')
    s = s.replace('</xxx-row>', '{{% /row %}}')
    
    print(s)
    

    Prints:

    <div>
    {{% row %}}
    <p>Some content.</p>
    {{% /row %}}
    </div>
    {{% row %}}
    <p>Some other content.</p>
    {{% /row %}}