goweb-scrapinggo-http

How to reuse HTTP request instance in Go


I'm building an API that scrapes some data off a webpage.

To do so, i need to send a GET request to a home page, scrape a 'RequestVerificationToken' from the HTML, then send another POST request to the same URL with a username, password, and the RequestVerificationToken.

I've been able to do this previously with Python:

session_requests = requests.session()
result = session_requests.get(LOGIN_URL)
parser = createBS4Parser(result.text)
return parser.find('input', attrs={'name': '__RequestVerificationToken'})["value"]

 pageDOM = session_requests.post(
        LOGIN_URL,
        data=requestPayload, //RequestVerificationToken is in here
        headers=requestHeaders
 )

It seems like when i reuse the session_requests variable in Python, it's reusing the previous instance of the HTTP request.

However, when i try to do this in Go, I get an error due to an invalid token. I assume that this is because for the POST request, Go is using a new instance.

Is there any way I can get the same behavior from Go as I was with Python?


Solution

  •  package main
    
     import (
        "fmt"
        "log"
    
       "github.com/gocolly/colly"
       "github.com/gocolly/colly/proxy"
         )
    
      func main() {
    //initiates the configuration
    c := colly.NewCollector(colly.AllowURLRevisit())
    //defining the proxy chain
    revpro, err := proxy.RoundRobinProxySwitcher("socks5://127.0.0.1:9050", "socks5://127.0.0.1:9050")
    if err != nil {
        log.Fatal(err)
    }
    c.SetProxyFunc(revpro)
    //parsing the required field from html we are extracting the csrf_token required for the login
    c.OnHTML("form[role=form] input[type=hidden][name=CSRF_TOKEN]", func(e *colly.HTMLElement) {
        csrftok := e.Attr("value")
        fmt.Println(csrftok)
        //posting the csrf value along with password
        err := c.Post("https://www.something.com/login.jsp", map[string]string{"CSRF_TOKEN": csrftok, "username": "username", "password": "password"})
        if err != nil {
            log.Fatal(err)
        }
        return
    })
    //The website to visit
    c.Visit("https://www.something.com/login.jsp")
    //maintaining the connection using clone not initiating a callback request
    d := c.Clone()
    d.OnHTML("a[href]", func(e *colly.HTMLElement) {
        link := e.Attr("href")
        fmt.Printf("Link found: %q -> %s\n", e.Text, link)
    
    })
    
    d.Visit("https://skkskskskk.htm")
      }