I am using Colly for scrapping an ecommerce website. I will loop over many products.
Here is a snippet of my code getting a sub-title
c.OnXML("/html/body/div[4]/div/div[3]/div[2]/div/div[1]/div[3]/div/div/h1/1234", func(e *colly.XMLElement) {
fmt.Println(e.Text)
})
However, not all products have a sub-title so the above XML
path does not work for all cases.
When I reach a product which does not have a sub-title my code got crashed and return an error of
panic: expression must evaluate to a node-set
Here is my so far code:
c := colly.NewCollector()
c.OnError(func(_ *colly.Response, err error) {
log.Println("Something went wrong:", err)
})
//Sub Title
c.OnXML("/html/body/div[4]/div/div[3]/div[2]/div/div[1]/div[3]/div/div/h1/1234", func(e *colly.XMLElement) {
fmt.Println(e.Text)
})
c.OnRequest(func(r *colly.Request) {
fmt.Println("Visiting", r.URL)
})
c.Visit("https://www.lazada.vn/-i1701980654-s7563711492.html")
Here is what I want
c.OnXML("/html/b.....v/h1/1234", func(e *colly.XMLElement) {
if no error {
fmt.Println("NO ERROR)
} else {
fmt.Println("GOT ERROR")
}
})
Maybe I figured out what went wrong in your code. Let me start with the final. As you can see, the error is originated from the panic
statement at line 473 of the parse.go
file. The package xpath
has a method called parseNodeTest
that does the following check:
func (p *parser) parseNodeTest(n node, axeTyp string) (opnd node) {
switch p.r.typ {
case itemName:
if p.r.canBeFunc && isNodeType(p.r) {
var prop string
switch p.r.name {
case "comment", "text", "processing-instruction", "node":
prop = p.r.name
}
var name string
p.next()
p.skipItem(itemLParens)
if prop == "processing-instruction" && p.r.typ != itemRParens {
checkItem(p.r, itemString)
name = p.r.strval
p.next()
}
p.skipItem(itemRParens)
opnd = newAxisNode(axeTyp, name, "", prop, n)
} else {
prefix := p.r.prefix
name := p.r.name
p.next()
if p.r.name == "*" {
name = ""
}
opnd = newAxisNode(axeTyp, name, prefix, "", n)
}
case itemStar:
opnd = newAxisNode(axeTyp, "", "", "", n)
p.next()
default:
panic("expression must evaluate to a node-set")
}
return opnd
}
The value of p.r.typ
is itemNumber
(28
). This leads the switch to enter into the default branch and gives the error. The methods invoked before the above-mentioned one (you can see them in the call stack of your IDE) set the typ
for the literal 1234
to this value and this caused an invalid XPath query. To make it works, you've to get rid of the 1234
and put some valid value.
Let me know if this solves your issue, thanks!