javapdf-reader

how to read a pdf file online and save on local machine using java


Hi I was trying to read a PDF file online but after reading and writing on local. after viewing the document I am getting an error that content is not supported .

 URL url1 =
              new URL("http://www.gnostice.com/downloads/Gnostice_PathQuest.pdf");

            byte[] ba1 = new byte[1024];
            int baLength;
            FileOutputStream fos1 = new FileOutputStream("/mnt/linuxabc/research_paper/Gnostice_PathQuest.pdf");

            try {
              URLConnection urlConn = url1.openConnection();
         /*     if (!urlConn.getContentType().equalsIgnoreCase("application/pdf")) {
                  System.out.println("FAILED.\n[Sorry. This is not a PDF.]");
              } else {*/
                try {
                  InputStream is1 = url1.openStream();
                  while ((baLength = is1.read(ba1)) != -1) {
                      fos1.write(ba1, 0, baLength);
                  }
                  fos1.flush();
                  fos1.close();
                  is1.close();


                } catch (ConnectException ce) {
                  System.out.println("FAILED.\n[" + ce.getMessage() + "]\n");
                }
             // }

Solution

  • Your Pdf Link actually redirects to https://www.gnostice.com/downloads.asp, so there is no pdf directly behind the link.

    Try with another link: check first in a browser of your choice that invoking the pdf's url render a real pdf in the browser.

    The code below is practically the same as yours except for the pdf's url and the output's path, and I am also adding exception throws to the main method's signature and simply printing the content type.

    It works as expected:

    public class PdfFileReader {
        public static void main(String[] args) throws IOException {
    
            URL pdfUrl = new URL("http://www.crdp-strasbourg.fr/je_lis_libre/livres/Anonyme_LesMilleEtUneNuits1.pdf");
            byte[] ba1 = new byte[1024];
            int baLength;
            try (FileOutputStream fos1 = new FileOutputStream("c:\\mybook.pdf")) {
                URLConnection urlConn = pdfUrl.openConnection();
                System.out.println("The content type is: " + urlConn.getContentType());
    
                try {
                    InputStream is1 = pdfUrl.openStream();
                    while ((baLength = is1.read(ba1)) != -1) {
                        fos1.write(ba1, 0, baLength);
                    }
                    fos1.flush();
                    fos1.close();
                    is1.close();
    
    
                } catch (ConnectException ce) {
                    System.out.println("FAILED.\n[" + ce.getMessage() + "]\n");
                }
            }
        }
    }
    

    Output:

    The content type is: application/pdf