I have three code. This is the first one in which I get the metadata information of any url and in that metadata I have LastModified date also. If I run this class then I get last modified date of url as--
key:- Last-Modified
value:- 2011-10-21T03:18:28Z
First one
public class App {
private static Map<String, String> metaData;
public static void main(String[] args) {
Tika t = new Tika();
Metadata md = new Metadata();
URL u = null;
try {
u = new URL("http://www.xyz.com/documents/files/xyz-china.pdf");
String content1= t.parseToString(u);
System.out.println("hello" +content1);
} catch (MalformedURLException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (TikaException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
try {
Reader r = t.parse(u.openStream(), md);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
try {
for (String name : md.names()){
String value = md.get(name);
System.out.println("key:- " +name);
System.out.println("value:- " +value);
//getMetaData().put(name.toLowerCase(), md.get(name));
}
}
catch(Exception e) {
e.printStackTrace();
}
}
}
But for second example just below this when I run this code and with the same url. I get different Last Modified date of that URL. How to make sure which one is right. As I tried opening that pdf in the browser but instead of getting open in the browser. it is getting open with Adobe PDF on the computer not on the browser so I am not able to check through firebug.
Second Way--
public class LastMod{
public static void main(String args[]) throws Exception {
URL url = new URL("http://www.xyz.com/documents/files/xyz-china.pdf");
System.out.println("URL:- " +url);
URLConnection connection = url.openConnection();
System.out.println(connection.getHeaderField("Last-Modified"));
}
}
For the above one I get Las Mod date as-
Thu, 03 Nov 2011 16:59:41 +0000
Third Way--
public class Main{
public static void main(String args[]) throws Exception {
URL url = new URL("http://www.xyz.com/documents/files/xyz-china.pdf");
HttpURLConnection httpCon = (HttpURLConnection) url.openConnection();
long date = httpCon.getLastModified();
if (date == 0)
System.out.println("No last-modified information.");
else
System.out.println("Last-Modified: " + new Date(date));
}
}
And by third method I get it like this--
Last-Modified: Thu Nov 03 09:59:41 PDT 2011
I am confuse which one is right. I think first one is right. Any suggestions will be appreciated..
The first piece of code extracts the date from the metadata of the PDF file, while the two other ones get the information from the HTTP headers returned by the Web server. The first one is probably more accurate if you want to know when the document was created/modified.