goweb-scrapingweb-crawlergo-colly

Go Colly - visiting a URL in a for loop


I have a case where I should visit multiple links and extract information from them. The problem is that when I use "colly.Visit(URL)" I am getting increased visiting. Example:

package main

import (
    "fmt"

    "github.com/gocolly/colly"
)

func main() {

    CATETORIES := []string{
        "cate1",
        "cate2",
        "cate3",
    }

    c := colly.NewCollector()

    for _, cate := range CATETORIES {

        c.OnRequest(func(r *colly.Request) {
            fmt.Println("Visiting categories", r.URL)
        })

        c.Visit(cate)
    }
}

That will print:

Visiting categories http://cate1  
Visiting categories http://cate2
Visiting categories http://cate2
Visiting categories http://cate3
Visiting categories http://cate3
Visiting categories http://cate3

I tried to initialize colly after every iteration and that worked well - then the order was: Visiting categories http://cate1, Visiting categories http://cate2, Visiting categories http://cate3 BUT doing it this way I am loosing my login session.. Any suggestions?


Solution

  • You are adding a new OnRequest handler for every loop iteration. Configure the handler outside of the loop:

    func main() {
    
      CATETORIES := []string{
        "cate1",
        "cate2",
        "cate3",
      }
    
      c := colly.NewCollector()
    
      c.OnRequest(func(r *colly.Request) {
        fmt.Println("Visiting categories", r.URL)
      })
    
      for _, cate := range CATETORIES {
        c.Visit(cate)
      }
    }