swiftbeautifulsouphtml-parsingswiftsoup

Getting all tag elements in an array with SwiftSoup


I worked on a project in python using BeautifulSoup for parsing an Html doc and adding ruby and rt tags to each string. Recently I've been working on a similar project for a personal IOS app. I found SwiftSoup which was similar but ran into a problem parsing a tag which I was able to do beautifully using BeautifulSoup. In Beautiful soup I am able to get a tag like the one below

<p id="p6" data-pid="6" data-rel-pid="[41]" class="p6">
  <span class="parNum" data-pnum="1"></span>
     This is a(<span id="citationsource2"></span><a epub:type="noteref" href="#citation2">link</a>)to some website。
</p>

using .content from BS4 I am able to get the tags into an array like this

['\n', <span class="parNum" data-pnum="1"></span>, '\n         This is a(', <span id="citationsource2"></span>, <a epub:type="noteref" href="#citation2">link</a>, ')to some website。\n    ']

After i go through the array and check if the children tags have text or if the element in the array is a text element and i just append the ruby tags. The result is this

 <p id="p6" data-pid="6" data-rel-pid="[41]" class="p6">
  <span class="parNum" data-pnum="1"></span>
     <ruby>This<rt>1</rt></ruby><ruby>is<rt>2</rt></ruby> <ruby>a<rt>3</rt></ruby>(<span id="citationsource2"></span><a epub:type="noteref" href="#citation2"><ruby>link<rt>4</rt></ruby></a>)<ruby>to<rt>5</rt></ruby> <ruby>some<rt>6</rt></ruby> <ruby>website<rt>7</rt></ruby>。
</p>

With SwiftSoup I parse the Document doing this since it doesn't have a similar method like the BS4 .content

let soup:Document = try! SwiftSoup.parse(html)
let elements:Elements = try! soup.select("p")
for j in try! elements.html(){
  
    print(try! j)
   //Doesn't work prints out every single character not every element
}

The problem is that it treats the whole content of the p tag as an element it doesnt separate the elements in the p tag like BS4 does. I looked at the documentation but I don't see anything about separating the elements from the tag into an array.

This is what I want to achieve with Swiftsoup

['\n', <span class="parNum" data-pnum="1"></span>, '\n         This is a(', <span id="citationsource2"></span>, <a epub:type="noteref" href="#citation2">link</a>, ')to some website。\n    ']

But end up getting everything as one element in the array instead of seperated elements.

[<span class="parNum" data-pnum="1"></span>This is a(<span id="citationsource2"> 
  </span> <a epub:type="noteref" href="#citation2">link</a>)to some website.]

Is there any way of achieving this using swiftsoup or another swift html parser that could achieve the same thing?


Solution

  • After looking at the SwiftSoup files I was able to find the answer to my question. SwiftSoup has a method called getChildNodes which allows you to get all the content of the specified tag. It returns an array of the content of the tag. Hope this helps anyone who has also faced a similar problem.

    let soup:Document = try! SwiftSoup.parseBodyFragment(html)
    let p : Elements = try! soup.select("p")
    for j in p{
        print(try! j.getChildNodes())
    
    }}