I use goquery's function .Each()
to recurse into the child elements. Is there a way to find out if this is the first (or last) child of a parent? I try to remove starting and trailing whitespace of HTML nodes. Checking for the first child is probably a matter of testing i == 0
. But what about the last child element?
This is my code so far:
package main
import (
"fmt"
"io"
"os"
"strings"
"github.com/PuerkitoBio/goquery"
)
// recursive function
func dumpElement(i int, sel *goquery.Selection) {
fmt.Println("dump Element - is this the first or last element? I don't know")
sel.Contents().Each(dumpElement)
}
func startRecursion(r io.Reader) error {
g, err := goquery.NewDocumentFromReader(r)
if err != nil {
return err
}
g.Find(":root > body").Each(dumpElement)
return nil
}
func main() {
doc := `<!DOCTYPE html>
<html><head><title>foo</title></head><body>
<div class="bla">foo <b> bar </b> baz</div>
</body></html>`
if err := startRecursion(strings.NewReader(doc)); err != nil {
os.Exit(-1)
}
}
Most likely you'd have to write a function that returns the funciton you are using, so you get access to the original selections length, something like:
type iterator func(int, *goquery.Selection)
func dumpElementFrom(s *goquery.Selection) iterator {
lastIndex := s.Size() - 1
return func(i int, sel *goquery.Selection) {
if i == lastIndex {
fmt.Println("Last Element")
}
sel.Contents().Each(dumpElement)
}
}
func startRecursion(r io.Reader) error {
g, err := goquery.NewDocumentFromReader(r)
if err != nil {
return err
}
g.Find(":root > body").Each(dumpElementFrom(g))
return nil
}