excelpdfocrpdf-conversionabbyy

When converting PDF to Excel with Omnipage or Abbyy Finereader, is there are way to stop it from splitting individual cells?


I'm trying to extract some tables from PDF files, and both tools (Abbyy and Omnipage) do a pretty good job of identifying the tables. But when it comes to identifying the rows and columns, they both make the same mistakes.

Usually, the problem comes when they create a partial row, splitting just one cell horizontally, but not the others. For an example of what I mean, see the attached image. In the column on the left, some of the cells are split in half, which makes the table difficult to work with in Excel.

I find it odd that these programs do this in the first place, since tables with split cells are always a pain.

Is there a way of telling these programs to set only full columns and rows, and not split individual cells?

Any suggestions for other solutions?

enter image description here


Solution

  • ABBYY has a lot of OCR products, the configurable ones are called FineReader Engine and FlexiLayout Studio. Other ABBYY products does not have the requested settings.