What is the default mode in which network requests are executed in GoColly? Since we have the Async
method in the collector I would assume that the default mode is synchronous.
However, I see no particular difference when I execute these 8 requests in the program other than I need to use Wait
for async mode. It seems as if the method only controls how the program is executed (the other code) and the requests are always asynchronous.
package main
import (
"fmt"
"github.com/gocolly/colly/v2"
)
func main() {
urls := []string{
"http://webcode.me",
"https://example.com",
"http://httpbin.org",
"https://www.perl.org",
"https://www.php.net",
"https://www.python.org",
"https://code.visualstudio.com",
"https://clojure.org",
}
c := colly.NewCollector(
colly.Async(true),
)
c.OnHTML("title", func(e *colly.HTMLElement) {
fmt.Println(e.Text)
})
for _, url := range urls {
c.Visit(url)
}
c.Wait()
}
The default collection is synchronous.
The confusing bit is probably the collector option colly.Async()
which ignores the actual param. In fact the implementation at the time of writing is:
func Async(a ...bool) CollectorOption {
return func(c *Collector) {
c.Async = true // uh-oh...!
}
}
Based on this issue, it was done this way for backwards compatibility, so that (I believe) you can pass an option with no param at it'll still work, e.g.:
colly.NewCollector(colly.Async()) // no param, async collection
If you remove the async option altogether and instantiate with just colly.NewCollector()
, the network requests will be clearly sequential — i.e. you can also remove c.Wait()
and the program won't exit right away.