gohtml-parsinggoquery

Use goquery to find a class whose value contains whitespace


Answered. User PuerkitoBio helped me out with his goquery package, and I'm sure I won't be the only one wondering how to do this. (I can mark this question as 'answered' in two days)

When using goquery to find classes of html tags, I hit a problem when the class contains whitespace. Here's an example:

package main

import (
    "fmt"
    "github.com/PuerkitoBio/goquery"
    "strings"
)

func main() {
    html_code := strings.NewReader(`
<html>
    <body>
        <h1>
            <span class="text title">Go </span>
        </h1>
        <p>
            <span class="text">totally </span>
            <span class="post">kicks </span>
        </p>
        <p>
            <span class="text">hacks </span>
        </p>
    </body>
<html>
    `)
    doc, _ := goquery.NewDocumentFromReader(html_code)
}

If I want to find the class "text title", I thought I would do this:

doc.Find(".text title").Each(func(i int, s *goquery.Selection) {
    class, _ := s.Attr("class")
    fmt.Println(class, s.Text())
})

But this doesn't work. (Answer is below.)


Solution

  • It was a problem with my understanding of HTML. The whitespace inside class="text title" shows that class has two values: text and title. In order to find multiple attributes of a class with goquery, I need to put them side by side (without whitespace) and prefix them with a .. Like this:

    doc.Find(".text.title").Each(func(i int, s *goquery.Selection) {
        class, _ := s.Attr("class")
        fmt.Println(class, s.Text())
    })
    

    Or if I ever want to find only the classes with the value title, I would do this:

    doc.Find(".title").Each(func(i int, s *goquery.Selection) {
        class, _ := s.Attr("class")
        fmt.Println(class, s.Text())
    })