domgogoquery

How can I get the type name of DOM using goquery?


I want to get the type name of DOM like 'a', img', 'tr', 'td', 'center' using goquery. How can I get?

package main

import (
    "github.com/PuerkitoBio/goquery"
)

func main() {
    doc, _ := goquery.NewDocument("https://news.ycombinator.com/")
    doc.Find("html body").Each(func(_ int, s *goquery.Selection) {
        // for debug.
        println(s.Size()) // return 1

        // I expect '<center>' on this URL, but I can't get it's name.
        // println(s.First().xxx) // ?
    })
}

Solution

  • *Selection.First gives you another *Selection which contains a slice of *html.Node which has a Data field which contains:

    tag name for element nodes, content for text

    So something like that:

    package main
    
    import (
        "github.com/PuerkitoBio/goquery"
        "golang.org/x/net/html"
    )
    
    func main() {
        doc, _ := goquery.NewDocument("https://news.ycombinator.com/")
        doc.Find("html body").Each(func(_ int, s *goquery.Selection) {
            // for debug.
            println(s.Size()) // return 1
    
            if len(s.Nodes) > 0 && s.Nodes[0].Type == html.ElementNode {
                println(s.Nodes[0].Data)
            }
        })
    }