javahttpclienthttp-head

How to get the "Title" from a webpage using HttpClient


I'm trying to get the "Title" from a webpage using Apache HttpClient 4.

Edit: My first approach was trying to get it from the header (using HttpHead). If that is not possible, how can I get it from the body of the response, as @Todd says?

Edit 2:

<head>
[...]
<title>This is what I need to get!</title>
[...]
</head>

Solution

  • Thank you everyone for your comments. The solution was pretty simple once jsoup was used.

    Document doc = Jsoup.connect("http://example.com/").get();
    String title = doc.title();
    

    Considering that I really need to connect using HttpClient, this is what I have:

    org.jsoup.nodes.Document doc = null;
    String title = "";
    
    System.out.println("Getting content... ");
    
    CloseableHttpClient httpclient = HttpClients.createDefault();
    HttpHost target = new HttpHost(host);
    HttpGet httpget = new HttpGet(path);
    CloseableHttpResponse response = httpclient.execute(target, httpget);
    
    System.out.println("Parsing content... ");
    
    try {
        String line = null;
        StringBuffer tmp = new StringBuffer();
        BufferedReader in = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));
        while ((line = in.readLine()) != null) {                    
            String decoded = new String(line.getBytes(), "UTF-8");
            tmp.append(" ").append(decoded);
        }
    
        doc = Jsoup.parse(String.valueOf(tmp)); 
    
        title = doc.title();
        System.out.println("Title=" + title); //<== ^_^
    
        //[...]
    
    } finally {
        response.close();
    }
    
    System.out.println("Done.");