pythonbeautifulsoupreplace

Replace the <body> of one website with another using Beautiful Soup / Python


I am trying to replace the tags and everything below with another tag and it's content.

**** CODE ****

from bs4 import BeautifulSoup as bs
import os
import re

 
# Remove the last segment of the path
base = os.path.dirname(os.path.abspath(__file__))

# Coffee Template
coffee = base + "\website-templates-master\coffee-shop-free-html5-template\index.html"

print(coffee)

with open(coffee) as coffee_html:
    # turn html into a list
    coffee_blob = coffee_html.readlines()

for line in coffee_blob:
    if "<body" in line:
        start = coffee_blob.index(line)
    if "</body>" in line:
        end = coffee_blob.index(line)

data = coffee_blob[start:end] 

# Open the HTML in which you want to make changes
# html = open(os.path.join(base, 'index.html'))

with open(base + "\index.html") as html:
 
    # Parse HTML file in Beautiful Soup
    soup = bs(html, 'html.parser')
 
# Give location where text is stored which you wish to alter

    body = soup.find('body').text
    soup.body.replace_with(data) 

Was expecting that the content of the data which is a list containing each line from the coffee template starting at the tag and ending at the tag would replace the content from the html file. The html file is just a barebones html file with some

tags in it's body.

What I get is: AttributeError: 'list' object has no attribute 'parent'

I am needing a solution to read the body of one file and replacing the body of another file; basically replacing the website


Solution

  • I was able to get it to work by first converting the items in a list to a string using join.

    data = ''.join(data)
    

    However, if anyone has a more efficient way to complete the same task, please let me know. Thanks