I wish to extract elementB
, and then stop before element C
and D
- i.e., do not extract the .text
of content elementC
and elementD
. However, I only know how to extract the entire div text, using Contents().Not
to ignore elementC
, but elementD
is still captured.
Here is the code I am currently using:
GoLang:
capturedText := s.Find("div").Contents().Not(".label").Text()
Which ignores elementC
, but not elementD
, which has no outer tags.
HTML:
<li><span><h2>elementA</h2></span><div>elementB<br><span class="label">elementC</span>elementD</div></li>
How do I capture only elementB
of <div>
, and not elementC
and elementD
?
Edit:
I have tried closing the div tag like so:
s.Find(".label").BeforeHtml(`</div>`)
and also tried:
s.Find(".label").BeforeHtml(`</div><div>`)
and accessing the first div
, disregarding the second div
which should now have elementD
with:
jp, _ := s.Find("div").First().Html()
However, this is not working. It seems that </div>
must not be an open tag - it needs to be <div>...</div>
to insert correctly. But this is NOT what I need, I require ONLY </div>
or </div><div>
to close the first div
correctly.
What is the appropriate way to fix this?
Since I can't edit the HTML with a 'broken' node, I have opted for this:
s.Find(".label").BeforeHtml(`|_SEPARATOR_|`) // Insert text separator into Html
preCleanNode := s.Find("div").Contents().Not(".label").Text() //Get Html as Text
cleanNode := strings.Split(preCleanNode, `|_SEPARATOR_|`) // Split text based on Text Separator
outputString := cleanNode[0] // Output our wanted text