htmlhttphtml-parsinghttp-head

HEAD requests vs getting only the `<head>` of a web page


I'm writing some link scraping code where I was hoping to grab only the <head> section of a given web page. Apparently I've been confused about what a HEAD request is, as I thought it was supposed to do exactly that. Instead, it just returns HTTP headers.

Is there a way to fetch just the <head> section of a given page, without getting the whole doc?


Solution

  • No, there is no provision for that in the HTTP protocol (which doesn't know about HTML at all). You'll need to do a proper GET or POST, the use an HTML parser to extract the data you need.

    The only thing you could do to limit what you get back is use the Range header, but that would just be guess-work on your part as to how much data you request.