gogo-colly

Scrape ONLY a certain <div> using gocolly


I'm trying to make a web scraper using gocolly. I want to ONLY scrape a <div> element with the id of dailyText on https://wol.jw.org/en/wol/h/r1/lp-e. How can I do this?


Solution

  • Thanks to xarantolus for this answer.
    This worked great for me (if the domain allowed me to use it, that is.)

    func main() {
        cly := colly.NewCollector(
            colly.AllowedDomains("https://yourpage.site"),
        )
        cly.OnHTML("body", func(e *colly.HTMLElement) {
            link := e.Attr("div")
            fmt.Printf("Link found: %q -> %s\n", e.Text, link)
            cly.Visit(e.Request.AbsoluteURL(link))
        })
        cly.OnRequest(func(r *colly.Request) {
            fmt.Println("Visiting", r.URL.String())
        })
        page := cly.Visit("https://yourpage.site")
        fmt.Print(page)
    }