htmlgogoquery

GoLang - GoQuery HTML Insertion Fails


I wish to extract elementB, and then stop before element C and D - i.e., do not extract the .text of content elementC and elementD. However, I only know how to extract the entire div text, using Contents().Not to ignore elementC, but elementD is still captured.

Here is the code I am currently using:

GoLang:

capturedText := s.Find("div").Contents().Not(".label").Text()

Which ignores elementC, but not elementD, which has no outer tags.

HTML:

<li><span><h2>elementA</h2></span><div>elementB<br><span class="label">elementC</span>elementD</div></li>

How do I capture only elementB of <div>, and not elementC and elementD?

Edit:

I have tried closing the div tag like so:

s.Find(".label").BeforeHtml(`</div>`)

and also tried:

s.Find(".label").BeforeHtml(`</div><div>`)

and accessing the first div, disregarding the second div which should now have elementD with:

jp, _ := s.Find("div").First().Html()

However, this is not working. It seems that </div> must not be an open tag - it needs to be <div>...</div> to insert correctly. But this is NOT what I need, I require ONLY </div> or </div><div> to close the first div correctly.

What is the appropriate way to fix this?


Solution

  • Since I can't edit the HTML with a 'broken' node, I have opted for this:

        s.Find(".label").BeforeHtml(`|_SEPARATOR_|`) // Insert text separator into Html
        preCleanNode := s.Find("div").Contents().Not(".label").Text() //Get Html as Text
        cleanNode := strings.Split(preCleanNode, `|_SEPARATOR_|`) // Split text based on Text Separator
        outputString := cleanNode[0] // Output our wanted text