pythonrecursionbeautifulsoupmindmapping

Beautiful Soup & Python. Can't get all nodes with recursion because getting "maximum recursion depth exceeded while calling a Python object" error


I work in program FreeMind which allows to create trees and import them as HTML files and I need to get every "path" of this tree and put them into list for example to work with each "path" separately after. enter image description here

For example from this code:

<body>
   <p>Example</p>
   <ul>
      <li>
         The First List
         <ul>
            <li>1</li>
            <li>2</li>
            <li>3</li>
         </ul>
      </li>
      <li>
         The Second List
         <ul>
            <li>4.1</li>
            <li>4.2</li>
         </ul>
      </li>
   </ul>
</body>

I need to get next separate branches of code:

<body>
   <p>Example</p>
   <ul>
      <li>
         The First List
         <ul>
            <li>1</li>
         </ul>
      </li>
   </ul>
</body>

<body>
   <p>Example</p>
   <ul>
      <li>
         The First List
         <ul>
            <li>2</li>
         </ul>
      </li>
   </ul>
</body>

<body>
   <p>Example</p>
   <ul>
      <li>
         The First List
         <ul>
            <li>3</li>
         </ul>
      </li>
   </ul>
</body>

<body>
   <p>Example</p>
   <ul>
      <li>
         The Second List
         <ul>
            <li>4.1</li>
         </ul>
      </li>
   </ul>
</body>

<body>
   <p>Example</p>
   <ul>
      <li>
         The Second List
         <ul>
            <li>4.2</li>
         </ul>
      </li>
   </ul>
</body>

I am trying that code and getting error "maximum recursion depth exceeded while calling a Python object":

from bs4 import BeautifulSoup

parsed = BeautifulSoup(open("example.html"))

body = parsed.body

def all_nodes(obj):
    for node in obj:
        print node
        all_nodes(node)

print all_nodes(body)

I think that I should explain what I want to do with all this stuff later. I am writing test cases in FreeMind and I am trying to write tool which could create csv table for example with all test cases. But for now I am just trying to get all test cases as texts.


Solution

  • Here's one way to do it. It's not that easy and pythonic though. Personally I don't like the solution, but it should be a good start for you. I bet there is a more beautiful and short way to do the same.

    The idea is to iterate over all elements that don't have children. For every such element iterate recursively over it's parents until we hit body:

    from bs4 import BeautifulSoup, Tag
    
    
    data = """
    your xml goes here
    """
    soup = BeautifulSoup(data)
    for element in soup.body.find_all():
        children = element.find_all()
        if not children:
            tag = Tag(name=element.name)
            tag.string = element.string
            for parent in element.parentGenerator():
                parent = Tag(name=parent.name)
                parent.append(tag)
                tag = parent
                if tag.name == 'body':
                    break
            print tag
    

    It produces:

    <body><p>Example</p></body>
    <body><ul><li><ul><li>1</li></ul></li></ul></body>
    <body><ul><li><ul><li>2</li></ul></li></ul></body>
    <body><ul><li><ul><li>3</li></ul></li></ul></body>
    <body><ul><li><ul><li>4.1</li></ul></li></ul></body>
    <body><ul><li><ul><li>4.2</li></ul></li></ul></body>
    

    UPD (writing parent's text too):

    soup = BeautifulSoup(data)
    for element in soup.body.find_all():
        children = element.find_all()
        if not children:
            tag = Tag(name=element.name)
            tag.string = element.string
            for parent in element.parentGenerator():
                parent_tag = Tag(name=parent.name)
                if parent.string:
                    parent_tag.string = parent.string
                parent_tag.append(tag)
                tag = parent_tag
                if tag.name == 'body':
                    break
            print tag
    

    Hope that helps.