I've been trying to load a response into a goquery document, but it appears to be failing (though it throws no errors).
The response I'm trying to load comes from:
https://www.bbcgoodfood.com/search_api_ajax/search/recipes?sort=created&order=desc&page=4
and while it doesn't throw any errors, when I call fmt.Println(goquery.OuterHtml(doc.Contents()))
I get the output:
<html><head></head><body></body></html>
Meanwhile, If I don't attempt to load it into a goquery document, and instead call
s, _ := ioutil.ReadAll(resp.Body)
fmt.Println(string(s))
I get:
<!doctype html>
<!--[if IE 7]> <html class="no-js lt-ie9 lt-ie8 no-touch" lang="en"> <![endif]-->
<!--[if IE 8]> <html class="no-js lt-ie9 no-touch" lang="en"> <![endif]-->
<!--[if gt IE 8]> <html class="no-js gt-ie-8 no-touch" lang="en"> <![endif]-->
<!--[if !IE]><!-->
<html class="no-js no-touch" lang="en">
<!--<![endif]-->
<head>
<meta charset="utf-8">
<title>Search | BBC Good Food</title>
<!--[if IE]><![endif]-->
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="prev" href="https://www.bbcgoodfood.com/search/recipes?page=3&sort=created&order=desc" />
<link rel="next" href="https://www.bbcgoodfood.com/search/recipes?page=5&sort=created&order=desc" />
<meta name="robots" content="noindex" />
<style>
.async-hide {
opacity: 0 !important
}
... etc
The basic logic of what I'm doing is as follows:
package main
import (
"fmt"
"net/http"
"github.com/PuerkitoBio/goquery"
"io/ioutil"
)
func main() {
baseUrl := "https://www.bbcgoodfood.com/search_api_ajax/search/recipes?sort=created&order=desc&page="
i := 4
// Make a request
req, _ := http.NewRequest(http.MethodGet, fmt.Sprintf("%s%d", baseUrl, i), nil)
// Create a new HTTP client and execute the request
client := &http.Client{}
resp, _ := client.Do(req)
// Print out response
s, _ := ioutil.ReadAll(resp.Body)
fmt.Println(string(s))
// Load into goquery doc
doc, _ := goquery.NewDocumentFromReader(resp.Body)
fmt.Println(goquery.OuterHtml(doc.Contents()))
}
The full response can be found here. Is there any particular reason why this won't load?
Go's html parser doesn't seem to like the html you're getting - the <html>
tags are all within comments, so I think it's just never getting going on the parsing.
If you prepend the document with <html>
everything works fine from there. One way to do that would be with a reader-wrapper, something like the following, which writes the html tag the first time Read
is called and delegates to resp.Body
on subsequent calls.
import "io"
var htmlTag string = "<html>\n"
type htmlAddingReader struct {
sentHtml bool
source io.Reader
}
func (r *htmlAddingReader) Read(b []byte) (n int, err error) {
if !r.sentHtml {
copy(b, htmlTag)
r.sentHtml = true
return len(htmlTag), nil
} else {
return r.source.Read(b)
}
}
To use this in your sample code, change the final section like so:
// Load into goquery doc
wrapped := &htmlAddingReader{}
wrapped.source = resp.Body
doc, _ := goquery.NewDocumentFromReader(wrapped)
fmt.Println(goquery.OuterHtml(doc.Contents()))