I have a case where I should visit multiple links and extract information from them. The problem is that when I use "colly.Visit(URL)" I am getting increased visiting. Example:
package main
import (
"fmt"
"github.com/gocolly/colly"
)
func main() {
CATETORIES := []string{
"cate1",
"cate2",
"cate3",
}
c := colly.NewCollector()
for _, cate := range CATETORIES {
c.OnRequest(func(r *colly.Request) {
fmt.Println("Visiting categories", r.URL)
})
c.Visit(cate)
}
}
That will print:
Visiting categories http://cate1
Visiting categories http://cate2
Visiting categories http://cate2
Visiting categories http://cate3
Visiting categories http://cate3
Visiting categories http://cate3
I tried to initialize colly after every iteration and that worked well - then the order was: Visiting categories http://cate1, Visiting categories http://cate2, Visiting categories http://cate3 BUT doing it this way I am loosing my login session.. Any suggestions?
You are adding a new OnRequest
handler for every loop iteration. Configure the handler outside of the loop:
func main() {
CATETORIES := []string{
"cate1",
"cate2",
"cate3",
}
c := colly.NewCollector()
c.OnRequest(func(r *colly.Request) {
fmt.Println("Visiting categories", r.URL)
})
for _, cate := range CATETORIES {
c.Visit(cate)
}
}