javaresthadoopfirefoxwebhdfs

Is there a way to only display a file with webhdfs REST API?


Hello StackOverflow community. I have been running into a problem lately regarding webhdfs REST API.

I have a servlet inside an application calling Apache Knox to access HDFS and HBase which prevents me from using configuration files and Hadoop base classes to solve my problem. I had this solution before installing Knox and it worked fine but I obviously cannot use this anymore.

When I use the API to read a file with the op=OPEN operation it downloads the file instead of just displaying it. Which seems normal when I directly request it through an url like webhdfs/v1/path/to/storage/myPDF.pdf?op=OPEN but when I try to call this URL from my code :

public byte[] requestDocumentKnox(String documentID, String documentType) {

    byte[] response = null;

    String surl = TSGVConstantes.KNOX_HDFS + SLASH + "v1" + SLASH + TSGVConstantes.HDFS_PATH + SLASH + documentID + "." + documentType + "?op=OPEN";

    HttpURLConnection connection = null;

    URL url;

    try {

        url = new URL(surl);
        connection = (HttpURLConnection) url.openConnection();

        String encoded = Base64.getEncoder().encodeToString((TSGVConstantes.HADOOP_USER + ":" + TSGVConstantes.KNOX_PASS).getBytes(StandardCharsets.UTF_8));

        connection.setRequestMethod("GET");
        connection.setDoOutput(true);
        connection.setRequestProperty("Authorization", "Basic " + encoded);

        if (connection.getResponseCode() != HTTP_CODE_OK) {

            Trace.error(connection.getResponseCode() + " " + connection.getResponseMessage());
        }

        response = IOUtils.toByteArray(connection.getInputStream());

    } catch (IOException ex) {

        Trace.error(ex);

    } finally {

        if (connection != null) {
            connection.disconnect();
        }
    }

    return response;
}

private void returnDocument(HttpServletResponse response, byte[] document, String documentType) throws IOException {

    ServletOutputStream output = null;

    try {

        output = response.getOutputStream();

        if(document != null) {   

            response.setContentType(TSGVConstantes.getMimeTypeMap().get(documentType));

            output.write(document);

        } else {

            // You don't need that
        }

    } catch (IOException ex) {

        Trace.error(ex);

    } finally {

        response.flushBuffer();

        if (output != null) {
            output.close();
        }
    }
}

Both methods work correctly, they are called by some other methods, the document parameter from returnDocument() is the byte[] returned by requestDocumentKnox() and there is no modification to that array between the calls.

This does display the PDF but also opens the print window of my web browser and tells me that it is possible that my pdf will not be displayed correctly.

My problem is : I need to get rid of that print pop up (and the warning) because my application is called inside others in order to display pdfs from HBase or HDFS, and I cannot let this pop up appears.

Thank you.

UPDATE

So apparently my problem does not appear on Chrome nor Edge but ONLY on Firefox (60.6.2esr (32 bits)). I feel a bit stupid not having tried that on other browsers before. Still I don't understand why and I cannot find a solution.

ANOTHER UPDATE

I found my problem, check my answer.


Solution

  • After some digging I finally found out. If you do have the same problem here the things you might want to verify :

    Conclusion : The problem was coming from the PDF and the viewer. I did not have the problem with other pdfs and other viewers. It seems like it was coming from Adobe Acrobat Document because when I tried to open it locally, the print pop up showed up as well.

    Hope this helps.

    PS : You might also want to check the settings of your brower regarding how it handles pdf. Here is a link that can help for firefox.