javaandroidandroid-studiopdfboxpdf-reader

How to get Text from certain Area in a PDF-Document in Android Studio?


I want to get data from a certain Position in a PDF-Document.

I tried to get Text from a certain Area in a PDF with the pdfbox library. But the addRegion method expects a Rectangle2D. Android only has the Rect class which is not a Rectangle2D. Because of that I get an error on:

stripper.addRegion("class1", rect);

What can I do to overcome this? Is there any other way to extract data from a certain Position in a PDF-Document with the Android SDK? Any other Library that works for this? Because I don't think there is a Library for Android Studio for Rectangle2D.

    PDDocument document = PDDocument.load(new File("/Users/osman/Desktop/test.pdf"));
    PDFTextStripperByArea stripper = new PDFTextStripperByArea();
    stripper.setSortByPosition(true);
    Rect rect = new Rect(0, 0, 0, 0);
    stripper.addRegion("class1", rect);
    stripper.extractRegions(document.getPage(1));
    System.out.println(stripper.getTextForRegion("class1"));

Error Message


Solution

  • I found a Solution to my own Question:

    There is a PDFBox Library for Android on GitHub. https://github.com/TomRoush/PdfBox-Android

    Add this to Android.Manifest dependencies

    dependencies {
        implementation 'com.tom-roush:pdfbox-android:2.0.25.0'
    }
    

    The Code to extract Data is:

        File path = Environment.getExternalStoragePublicDirectory(
                Environment.DIRECTORY_DOWNLOADS);
        File file = new File(path, "Test.pdf");
        PDDocument document = PDDocument.load(new File(file.getAbsolutePath()));
        PDFBoxResourceLoader.init(getApplicationContext());
        PDFTextStripperByArea stripper = new PDFTextStripperByArea();
        stripper.setSortByPosition(true);
        RectF rect = new RectF(100, 100, 300, 300);
        stripper.addRegion("class1", rect);
        stripper.extractRegions(document.getPage(0));
        System.out.println(stripper.getTextForRegion("class1"));