I'm trying to extract some tables from PDF files, and both tools (Abbyy and Omnipage) do a pretty good job of identifying the tables. But when it comes to identifying the rows and columns, they both make the same mistakes.
Usually, the problem comes when they create a partial row, splitting just one cell horizontally, but not the others. For an example of what I mean, see the attached image. In the column on the left, some of the cells are split in half, which makes the table difficult to work with in Excel.
I find it odd that these programs do this in the first place, since tables with split cells are always a pain.
Is there a way of telling these programs to set only full columns and rows, and not split individual cells?
Any suggestions for other solutions?
ABBYY has a lot of OCR products, the configurable ones are called FineReader Engine and FlexiLayout Studio. Other ABBYY products does not have the requested settings.