I need to find certain words in an html file and replace them with links. The result should be that the file (displayed by a browser) allows you to klick on the links as usual. Beautiful Soup automatically escapes the tag. How can I avoid that behaviour?
Minimal Example
#!/usr/bin/env python3
from bs4 import BeautifulSoup
import re
html = \
'''
Identify
'''
soup = BeautifulSoup(html,features="html.parser")
for txt in soup.findAll(text=True):
if re.search('identi',txt,re.I) and txt.parent.name != 'a':
newtext = re.sub('identify', '<a href="test.html"> test </a>', txt.lower())
txt.replace_with(newtext)
print(soup)
Result:
<a href="test.html"> test </a>
Intended result:
<a href="test.html"> test </a>
You can put new soup with markup as parameter to .replace_with()
, for example:
import re
from bs4 import BeautifulSoup
html = '''
Other Identify Other
'''
soup = BeautifulSoup(html,features="html.parser")
for txt in soup.findAll(text=True):
if re.search('identi',txt,re.I) and txt.parent.name != 'a':
new_txt = re.sub(r'identi[^\s]*', '<a href="test.html">test</a>', txt, flags=re.I)
txt.replace_with(BeautifulSoup(new_txt, 'html.parser'))
print(soup)
Prints:
Other <a href="test.html">test</a> Other