I need to find a way to ignore pictures and photos from PDF document during conversion to DOCX file.
I am creating an instance of FineReader Engine:
IEngine engine = Engine.InitializeEngine(
engineConfig.getDllFolder(), engineConfig.getCustomerProjectId(),
engineConfig.getLicensePath(), engineConfig.getLicensePassword(), "", "", false);
After that, I am converting a document:
IFRDocument document = engine.CreateFRDocument();
document.AddImageFile(file.getAbsolutePath(), null, null);
document.Process(null);
String exportPath = FileUtil.prepareExportPath(file, resultFolder);
document.Export(exportPath, FileExportFormatEnum.FEF_DOCX, null);
As a result, it converts all images from the initial pdf document.
I'm not really familiar with PDF to DOCX conversion, but i think you could try custom profiles according to your needs.
At some point in your code you should create a Engine
object, and then create a Document
object (or IFRDocument
object depending of your application). Add this line just before giving your document to your engine for processing:
engine.LoadProfile(PROFILE_FILENAME);
Then create your file with some processing parameters described in the documentation packaged with your FRE installation under "Working with Profiles" section. Do not forget to add in your file:
... some params under other sections
[PageAnalysisParams]
DetectText = TRUE --> force text detection
DetectPictures = FALSE --> ignore pictures
... other params under PageAnalysisParams
... some params under other sections
It works the same way for Barcodes, etc... But keep in mind to benchmark your results when adding or removing things from this file as it may alter processing speed and global quality of your result.