goweb-scrapingreverse-proxygo-colly

Golang Colly Scraping - Website Captcha Catches My Scrape


I did make Scraping for Amazon Product Titles but Amazon captcha catches my scraper. I tried 10 times- go run main.go(8 times catches me - 2 times I scraped the product title)

I researched this but I did not find any solution for golang(there is just python) is there any solution for me?


package main

import (
    "fmt"
    "strings"0

    "github.com/gocolly/colly"
)

func main() {

    // Create a Collector specifically for Shopify
    c := colly.NewCollector(
        colly.AllowedDomains("www.amazon.com", "amazon.com"),
    )
    c.OnHTML("div", func(h *colly.HTMLElement) {
        capctha := h.Text
        title := h.ChildText("span#productTitle")
        fmt.Println(strings.TrimSpace(title))
        fmt.Println(strings.TrimSpace(capctha))
    })

    // Start the collector
    c.Visit("https://www.amazon.com/Bluetooth-Over-Ear-Headphones-Foldable-Prolonged/dp/B07K5214NZ")
}

Output:

Enter the characters you see below Sorry, we just need to make sure you're not a robot. For best results, please make sure your browser is accepting cookies.


Solution

  • If you don't mind a different package, I wrote a package to search HTML (essentially thin wrapper around github.com/tdewolff/parse):

    package main
    
    import (
       "github.com/89z/parse/html"
       "net/http"
       "os"
    )
    
    func main() {
       req, err := http.NewRequest(
          "GET", "https://www.amazon.com/dp/B07K5214NZ", nil,
       )
       req.Header = http.Header{
          "User-Agent": {"Mozilla"},
       }
       res, err := new(http.Transport).RoundTrip(req)
       if err != nil {
          panic(err)
       }
       defer res.Body.Close()
       lex := html.NewLexer(res.Body)
       lex.NextAttr("id", "productTitle")
       os.Stdout.Write(lex.Bytes())
    }
    

    Result:

    Bluetooth Headphones Over-Ear, Zihnic Foldable Wireless and Wired Stereo
    Headset Micro SD/TF, FM for Cell Phone,PC,Soft Earmuffs &Light Weight for
    Prolonged Waring(Rose Gold)
    

    https://github.com/89z/parse