I want to get data from a certain Position in a PDF-Document.
I tried to get Text from a certain Area in a PDF with the pdfbox library. But the addRegion method expects a Rectangle2D. Android only has the Rect class which is not a Rectangle2D. Because of that I get an error on:
stripper.addRegion("class1", rect);
What can I do to overcome this? Is there any other way to extract data from a certain Position in a PDF-Document with the Android SDK? Any other Library that works for this? Because I don't think there is a Library for Android Studio for Rectangle2D.
PDDocument document = PDDocument.load(new File("/Users/osman/Desktop/test.pdf"));
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(true);
Rect rect = new Rect(0, 0, 0, 0);
stripper.addRegion("class1", rect);
stripper.extractRegions(document.getPage(1));
System.out.println(stripper.getTextForRegion("class1"));
I found a Solution to my own Question:
There is a PDFBox Library for Android on GitHub. https://github.com/TomRoush/PdfBox-Android
Add this to Android.Manifest dependencies
dependencies {
implementation 'com.tom-roush:pdfbox-android:2.0.25.0'
}
The Code to extract Data is:
File path = Environment.getExternalStoragePublicDirectory(
Environment.DIRECTORY_DOWNLOADS);
File file = new File(path, "Test.pdf");
PDDocument document = PDDocument.load(new File(file.getAbsolutePath()));
PDFBoxResourceLoader.init(getApplicationContext());
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(true);
RectF rect = new RectF(100, 100, 300, 300);
stripper.addRegion("class1", rect);
stripper.extractRegions(document.getPage(0));
System.out.println(stripper.getTextForRegion("class1"));