gogo-colly

I cannot web-scrape forbes top billionares website with colly go


package main

import (
"encoding/csv"
"fmt"
"os"

"github.com/gocolly/colly"
)

 func checkError(err error){
 if err!=nil{
    panic(err)
}
}
func main(){
fName:="data.csv"
file,err:=os.Create(fName)
checkError(err)
defer file.Close()
writer:=csv.NewWriter(file)
defer writer.Flush()
c:=colly.NewCollector(colly.AllowedDomains("forbes.com","www.forbes.com"))
c.OnHTML(".scrolly-table tbody tr", func(e *colly.HTMLElement) {
        writer.Write([]string{
            e.ChildText(".rank .ng-binding"),
        })
    })  
    c.OnError(func(_ *colly.Response, err error) {
        fmt.Println("Something went wrong:", err)
    })
    c.OnRequest(func(r *colly.Request) {
        fmt.Println("Visiting", r.URL)
    })
    c.OnResponse(func(r *colly.Response) {
        fmt.Println("Visited", string(r.Body))
    })
    c.Visit("https://forbes.com/real-time-billionaires/")
     }

This is my code, when i requested i am getting the fallback page ,This is the link for forbes that i am trying to scrape

I have noticed that the website uses hash path which is at the last part of url and i cannot request with the same url twice, and i think its somehow related to scraping, can anyone help me with this?


Solution

  • Make sure what is available if you disable javascript in your browser (you can do it using the developer tools). Most scrapers will only get you the textual representation of the page, while the browser will also run javascript engine against it. If the data you are trying to scrape is populated with Javascript, there is a very good chance that is the reason you can't scrape it.