gogo-colly

What can the go-colly library do?


Can the go-colly library crawl all HTML tags and text content under a div tag? If so, how? I can get all texts under a div tag. Like this:

c.OnHTML("body .post-topic-main .post-topic-des", func(e *colly.HTMLElement) {
            text = strings.TrimSpace(e.Text)
        })

But I dont'know how to get HTML tags under the div tag.


Solution

  • If you looking for innerHTML it is accessible by DOM and using Html method (e.DOM.Html()).

    c.OnHTML("body .post-topic-main .post-topic-des", func(e *colly.HTMLElement) {
        html, _ := e.DOM.Html()
        log.Println(html)
    })
    

    If you looking for a special tag under the founded element, ForEach could use for this purpose. The first argument is the selector and the second parameter is the callback function. The callback function will iterate for each element that matches the selector and also is a member of the e element.

    More information: https://pkg.go.dev/github.com/gocolly/colly@v1.2.0#HTMLElement.ForEach

    c.OnHTML("body .post-topic-main .post-topic-des", func(e *colly.HTMLElement) {
        text := strings.TrimSpace(e.Text)
        log.Println(text)
        e.ForEach("div", func(_ int, el *colly.HTMLElement) {
            text := strings.TrimSpace(e.Text)
            log.Println(text)
        })
    })