javaurlurl-shortenerbit.lytinyurl

How to get the complete URL address most efficiently?


I'm using a Java program to get expanded URLs from short URLs. Given a Java URLConnection, among the two approaches, which one is better to get the desired result?

Connection.getHeaderField("Location");

vs

Connection.getURL();

I guess both of them give the same output. The first approach did not give me the best results, only 1 out of 7 were resolved. Can the efficiency be increased by the second approach?

Can we use any other better approach?


Solution

  • I'd use the following:

    @Test
    public void testLocation() throws Exception {
        final String link = "http://bit.ly/4Agih5";
    
        final URL url = new URL(link);
        final HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
        urlConnection.setInstanceFollowRedirects(false);
    
        final String location = urlConnection.getHeaderField("location");
        assertEquals("http://stackoverflow.com/", location);
        assertEquals(link, urlConnection.getURL().toString());
    }
    

    With setInstanceFollowRedirects(false) the HttpURLConnection does not follow redirects and the destination page (stackoverflow.com in the above example) will not be downloaded just the redirect page from bit.ly.

    One drawback is that when a resolved bit.ly URL points to another short URL for example on tinyurl.com you will get a tinyurl.com link, not what the tinyurl.com redirects to.

    Edit:

    To see the reponse of bit.ly use curl:

    $ curl --dump-header /tmp/headers http://bit.ly/4Agih5
    <html>
    <head>
    <title>bit.ly</title>
    </head>
    <body>
    <a href="http://stackoverflow.com/">moved here</a>
    </body>
    </html>
    

    As you can see bit.ly sends only a short redirect page. Then check the HTTP headers:

    $ cat /tmp/headers
    HTTP/1.0 301 Moved Permanently
    Server: nginx
    Date: Wed, 06 Nov 2013 08:48:59 GMT
    Content-Type: text/html; charset=utf-8
    Cache-Control: private; max-age=90
    Location: http://stackoverflow.com/
    Mime-Version: 1.0
    Content-Length: 117
    X-Cache: MISS from cam
    X-Cache-Lookup: MISS from cam:3128
    Via: 1.1 cam:3128 (squid/2.7.STABLE7)
    Connection: close
    

    It sends a 301 Moved Permanently response with a Location header (which points to http://stackoverflow.com/). Modern browsers don't show you the HTML page above. Instead they automatically redirect you to the URL in the Location header.