I'm somewhat new to go and am trying to scrape several webpages using colly. Two of the pages have incomplete links, the below is the code and output
func PaloNet() {
c := colly.NewCollector(
colly.AllowedDomains("security.paloaltonetworks.com"),
)
c.OnHTML(".list", func(e *colly.HTMLElement) {
PaloNetlinks := e.ChildAttrs("a", "href")
fmt.Println("\n\n PaloAlto Security: \n\n", PaloNetlinks)
})
c.Visit("https://security.paloaltonetworks.com/")
}
Output:
[/CVE-2022-0031 /CVE-2022-42889 /PAN-SA-2022-0006 /CVE-2022-0030 /CVE-2022-0029 /PAN-SA-2022-0005 /CVE-2022-28199 /PAN-SA-2022-0004 /CVE-2022-0028 /PAN-SA-2022-0003 /CVE-2022-0024 /CVE-2022-0026 /CVE-2022-0025 /CVE-2022-0027 /PAN-SA-2022-0001 /PAN-SA-2022-0002 /CVE-2022-0023 /CVE-2022-0778 /CVE-2022-22963 /CVE-2022-0022 /CVE-2021-44142 /CVE-2022-0016 /CVE-2022-0017 /CVE-2022-0020 /CVE-2022-0011 /csv?]
As you can see the links are missing the 'https://security.paloaltonetworks.com/' section. What would be the best way to add the start of the link
you can do it like this
func PaloNet() {
visitUrl := "https://security.paloaltonetworks.com"
urls := []string{}
c := colly.NewCollector(
colly.AllowedDomains("security.paloaltonetworks.com"),
)
c.OnHTML(".list", func(e *colly.HTMLElement) {
PaloNetlinks := e.ChildAttrs("a", "href")
for i := 0; i < len(PaloNetlinks); i++ {
urls = append(urls, visitUrl+PaloNetlinks[i])
}
fmt.Println("\n\n PaloAlto Security: \n\n", urls)
})
c.Visit(visitUrl)
}