htmlgoweb-scrapinghtml-parsinggo-colly

how to scrape attribute in attibute with colly


I try to scrape productId of a product but i can not. please help

html code

<span class="info">
 <button data-product="{"merchantName":"xxx","price":"11","productName":"car window","categoryName":"windows","brandName":"aa assosiations","productId":"which I want to scrape"}">
                  

when I try

h.ChildAttr("span.info>button", "data-product")

result is {"merchantName":"xxx","price":"11","productName":"car window","categoryName":"windows","brandName":"aa assosiations","productId":"which I want to scrape"}

and when I try

h.ChildAttr("span.info>button", "productId")

there is no result. how can I get this data with colly?


Solution

  • The attribute value is a raw value, and in this case, it's in JSON format, so you will need to parse the JSON in order to correctly get the data.

    For example:

    package main
    
    import (
        "log"
        "encoding/json"
        "github.com/gocolly/colly"
    )
    
    func main() {
        c := colly.NewCollector()
    
        c.OnHTML(`body`, func(e *colly.HTMLElement) {
            text := e.ChildAttr("span.info>button", "data-product")
    
            var result map[string]interface{}
            err := json.Unmarshal([]byte(text), &result)
            if err != nil {
                log.Println(err)
                return
            }
            log.Println(result["productId"])
        })
    
        c.Visit("[some url]")
    }
    

    Output

    2021/10/21 14:23:24 which I want to scrape