androidrectandroid-visiontext-recognition

Sort TextBlock as top to bottom in vision API


While I am scanning for text using vision API, Overlay return multiple text boxes as unsorted list. So when I read for text by looping them, sometimes I am getting texts in wrong order, ie., text from bottom of the page appears first.

Sample code of receiveDetections in OcrDetectorProcessor.java

@Override
public void receiveDetections(Detector.Detections<TextBlock> detections) {
    mGraphicOverlay.clear();
    SparseArray<TextBlock> items = detections.getDetectedItems();
    for (int i = 0; i < items.size(); ++i) {
        TextBlock item = items.valueAt(i);
        OcrGraphic graphic = new OcrGraphic(mGraphicOverlay, item);
        mGraphicOverlay.add(graphic);
    }
}

In this code, I want to sort mGraphicOverlay list based on TextBlock's position.

If any solution/suggestion available, then it will be very helpful for me.


Solution

  • I created textblock comparator like this.

    public static Comparator<TextBlock> TextBlockComparator
            = new Comparator<TextBlock>() {
        public int compare(TextBlock textBlock1, TextBlock textBlock2) {
            return textBlock1.getBoundingBox().top - textBlock2.getBoundingBox().top;
        }
    };
    

    And sorted using Arrays.sort(myTextBlocks, Utils.TextBlockComparator);

    Update

    Today I had a time to test @rajesh's Answer. It seems textblock sorting is more accurate than text line sorting.

    I tried to extract text from following image. enter image description here

    Result by TextBlockComparator enter image description here

    Result by TextLineComparator enter image description here