javapdfpdfboxapache-drilltabula

Why is a blank Java Icon appearing when I parse a PDF file using Tabula?


I am working on an integration with Apache Drill which enables users to query PDF files directly using SQL. I'm about 80% done and really impressed with how well Tabula works for this.

However, when I execute the first Drill query that uses the Tabula libraries a Java icon pops up and I get the following text in the command line:

2020-10-25 15:06:55.770 java[71188:7121498] Persistent UI failed to open file file://localhost/Users/******/Saved%20Application%20State/net.java.openjdk.cmd.savedState/window_1.data: Permission denied (13)

I changed the permissions on that directory but I'm still getting the Java popup.

enter image description here

This is not normal behavior for Drill and my goal here was to integrate Tabula programmatically. Is Tabula trying to open a window or something like that and if so, is there a way to disable this? I noted that this does not occur in my unit tests.

Here are some relevant code snippets:

 public static List<Table> extractTablesFromPDF(PDDocument document, ExtractionAlgorithm algorithm) {
    NurminenDetectionAlgorithm detectionAlgorithm = new NurminenDetectionAlgorithm();

    ExtractionAlgorithm algExtractor;

    SpreadsheetExtractionAlgorithm extractor=new SpreadsheetExtractionAlgorithm();

    ObjectExtractor objectExtractor = new ObjectExtractor(document);
    PageIterator pages = objectExtractor.extract();
    List<Table> tables= new ArrayList<>();
    while (pages.hasNext()) {
      Page page = pages.next();

      algExtractor = algorithm;
      /*if (extractor.isTabular(page)) {
        algExtractor=new SpreadsheetExtractionAlgorithm();
      }
      else {
        algExtractor = new BasicExtractionAlgorithm();
      }*/

      List<Rectangle> tablesOnPage = detectionAlgorithm.detect(page);

      for (Rectangle guessRect : tablesOnPage) {
        Page guess = page.getArea(guessRect);
        tables.addAll(algExtractor.extract(guess));
      }
    }
    return tables;
  }

This doesn't happen in my unit tests. Thanks in advance for your help!


Solution

  • Because some code is executed that does an operation that is usually, but technically not necessarily, involved in things that require so-called 'headful' mode (well, that's perhaps not really a term, but the opposite, 'headless' certainly is). This causes a few things to happen, including that icon showing up.

    One easy way out of this is to force headless mode. But note that when you do this, any of these 'usually but technically not neccessarily headful' operations may either [1] work fine and no longer show that icon, or, [2] crash with a HeadlessException. Which one you end up with is not just dependent on which operation you're doing, but also which VM you are doing it on - as a rule once one of these ops works fine and no longer throws, later versions won't revert back to throwing (in other words, newer versions of java offer more things that work in headless mode).

    To force headless mode, run java with java -Djava.awt.headless=true.

    If you must do it from within java code, run System.setProperty("java.awt.headless", "true"); at least once, and before you do any of these 'usually causes headful mode' operations.

    Presumably, the thing that is causes headful mode to occur is something graphics involved, such as rendering a JPG or PNG into an ImageBuffer. It's not surprising that Apache Drill is doing this to 'read' images, for example.

    Another option is to just upgrade your VM, maybe that helps. As a general rule, features 'move downwards' on this line: