I work in program FreeMind which allows to create trees and import them as HTML files and I need to get every "path" of this tree and put them into list for example to work with each "path" separately after.
For example from this code:
<body>
<p>Example</p>
<ul>
<li>
The First List
<ul>
<li>1</li>
<li>2</li>
<li>3</li>
</ul>
</li>
<li>
The Second List
<ul>
<li>4.1</li>
<li>4.2</li>
</ul>
</li>
</ul>
</body>
I need to get next separate branches of code:
<body>
<p>Example</p>
<ul>
<li>
The First List
<ul>
<li>1</li>
</ul>
</li>
</ul>
</body>
<body>
<p>Example</p>
<ul>
<li>
The First List
<ul>
<li>2</li>
</ul>
</li>
</ul>
</body>
<body>
<p>Example</p>
<ul>
<li>
The First List
<ul>
<li>3</li>
</ul>
</li>
</ul>
</body>
<body>
<p>Example</p>
<ul>
<li>
The Second List
<ul>
<li>4.1</li>
</ul>
</li>
</ul>
</body>
<body>
<p>Example</p>
<ul>
<li>
The Second List
<ul>
<li>4.2</li>
</ul>
</li>
</ul>
</body>
I am trying that code and getting error "maximum recursion depth exceeded while calling a Python object":
from bs4 import BeautifulSoup
parsed = BeautifulSoup(open("example.html"))
body = parsed.body
def all_nodes(obj):
for node in obj:
print node
all_nodes(node)
print all_nodes(body)
I think that I should explain what I want to do with all this stuff later. I am writing test cases in FreeMind and I am trying to write tool which could create csv table for example with all test cases. But for now I am just trying to get all test cases as texts.
Here's one way to do it. It's not that easy and pythonic though. Personally I don't like the solution, but it should be a good start for you. I bet there is a more beautiful and short way to do the same.
The idea is to iterate over all elements that don't have children. For every such element iterate recursively over it's parents until we hit body
:
from bs4 import BeautifulSoup, Tag
data = """
your xml goes here
"""
soup = BeautifulSoup(data)
for element in soup.body.find_all():
children = element.find_all()
if not children:
tag = Tag(name=element.name)
tag.string = element.string
for parent in element.parentGenerator():
parent = Tag(name=parent.name)
parent.append(tag)
tag = parent
if tag.name == 'body':
break
print tag
It produces:
<body><p>Example</p></body>
<body><ul><li><ul><li>1</li></ul></li></ul></body>
<body><ul><li><ul><li>2</li></ul></li></ul></body>
<body><ul><li><ul><li>3</li></ul></li></ul></body>
<body><ul><li><ul><li>4.1</li></ul></li></ul></body>
<body><ul><li><ul><li>4.2</li></ul></li></ul></body>
UPD (writing parent's text too):
soup = BeautifulSoup(data)
for element in soup.body.find_all():
children = element.find_all()
if not children:
tag = Tag(name=element.name)
tag.string = element.string
for parent in element.parentGenerator():
parent_tag = Tag(name=parent.name)
if parent.string:
parent_tag.string = parent.string
parent_tag.append(tag)
tag = parent_tag
if tag.name == 'body':
break
print tag
Hope that helps.