Answered. User PuerkitoBio helped me out with his goquery
package, and I'm sure I won't be the only one wondering how to do this. (I can mark this question as 'answered' in two days)
When using goquery
to find classes of html tags, I hit a problem when the class contains whitespace. Here's an example:
package main
import (
"fmt"
"github.com/PuerkitoBio/goquery"
"strings"
)
func main() {
html_code := strings.NewReader(`
<html>
<body>
<h1>
<span class="text title">Go </span>
</h1>
<p>
<span class="text">totally </span>
<span class="post">kicks </span>
</p>
<p>
<span class="text">hacks </span>
</p>
</body>
<html>
`)
doc, _ := goquery.NewDocumentFromReader(html_code)
}
If I want to find the class "text title"
, I thought I would do this:
doc.Find(".text title").Each(func(i int, s *goquery.Selection) {
class, _ := s.Attr("class")
fmt.Println(class, s.Text())
})
But this doesn't work. (Answer is below.)
It was a problem with my understanding of HTML. The whitespace inside class="text title"
shows that class
has two values: text
and title
. In order to find multiple attributes of a class with goquery
, I need to put them side by side (without whitespace) and prefix them with a .
. Like this:
doc.Find(".text.title").Each(func(i int, s *goquery.Selection) {
class, _ := s.Attr("class")
fmt.Println(class, s.Text())
})
Or if I ever want to find only the classes with the value title
, I would do this:
doc.Find(".title").Each(func(i int, s *goquery.Selection) {
class, _ := s.Attr("class")
fmt.Println(class, s.Text())
})