opencvpanoramasimage-stitching

Panorama stitching for text


I'm looking for a good panorama stitching library for text. I tried OpenCV and OpenPano. They both work good on regular photos, but fail on text. For example I need to stitch the following 3 images:

1st s2nd 3rd

The images have about 45% overlapping between each other.

If there's an option to make one of the mentioned libraries work good on text images, instead of finding another library, that would be great.


Solution

  • OpenPano fails at stitching text because it cannot retrieve enough feature points (or keypoints) to do the stitching process.

    Text stitching doesn't need a matching method that is robust to rotations but only to translations. OpenCV conveniently offers such a function. It is called : Template Matching.

    The solution I will develop is based on this OpenCV's feature.


    Pipeline

    I will now explain the main steps of my solution (for further details, please have a look at the code provided bellow).

    Matching process

    In order to match two consecutive images (done in the matchImages function, see code bellow):

    1. We create a template image by taking 45% (H_templ_ratio) of the first image as depicted bellow:

    Template, 45% of the original image

    This step is done in my code by the function genTemplate.

    1. We add black margins to the second image (where we want to find the template). This step is necessary if the text is not aligned in the input images (this is the case on these sample images though). Here is what the image looks like after the margin process. As you can see, the margins are only needed bellow and above the image:

    margins are only needed above and bellow

    The template image could theoretically be found anywhere in this margined image. This process is done in the addBlackMargins function.

    1. We apply a canny filter on both the template image and the image where we want to find it (done inside the Mat2Edges function). This will add robustness to the matching process. Here is an example:

    Example of the canny filter

    1. We match the template with the image using matchTemplate and we retrieve the best match location with the minMaxLoc function.

    Calculating final image size

    This step consists in calculating the size of the final matrix where we will stitch all the images together. This is particularly needed if all the input images don't have the same height.

    This step is done inside the calcFinalImgSize function. I won't get into to much details here because even though it looks a bit complex (for me at least), this is only simple maths (additions, subtractions, multiplications). Take a pen and paper if you want to understand the formulas.

    Stitching process

    Once we have the match locations for each input images, we only have to do simple maths to copy the input images in the right spot of the final image. Again, I recommend you to check the code for implementation details (see stitchImages function).


    Results

    Here is the result with your input images:

    final result with sample provided

    As you can see, the result is not "pixel perfect" but it should be good enough for OCR.

    And here is another result with input images of different heights:

    result with images of different heights


    Code (Python)

    My program is written in Python and uses cv2 (OpenCV) and numpy modules. However it can be ported (easily) in other languages such as C++, Java and C#.

    import numpy as np
    import cv2
    
    def genTemplate(img): 
        global H_templ_ratio
        # we get the image's width and height
        h, w = img.shape[:2]
        # we compute the template's bounds
        x1 = int(float(w)*(1-H_templ_ratio))
        y1 = 0
        x2 = w
        y2 = h
        return(img[y1:y2,x1:x2]) # and crop the input image
    
    def mat2Edges(img): # applies a Canny filter to get the edges
        edged = cv2.Canny(img, 100, 200)
        return(edged)
    
    def addBlackMargins(img, top, bottom, left, right): # top, bottom, left, right: margins width in pixels
        h, w = img.shape[:2]
        result = np.zeros((h+top+bottom, w+left+right, 3), np.uint8)
        result[top:top+h,left:left+w] = img
        return(result)
    
    # return the y_offset of the first image to stitch and the final image size needed
    def calcFinalImgSize(imgs, loc):
        global V_templ_ratio, H_templ_ratio
        y_offset = 0
        max_margin_top = 0; max_margin_bottom = 0 # maximum margins that will be needed above and bellow the first image in order to stitch all the images into one mat
        current_margin_top = 0; current_margin_bottom = 0
    
        h_init, w_init = imgs[0].shape[:2]
        w_final = w_init
        
        for i in range(0,len(loc)):
            h, w = imgs[i].shape[:2]
            h2, w2 = imgs[i+1].shape[:2]
            # we compute the max top/bottom margins that will be needed (relatively to the first input image) in order to stitch all the images
            current_margin_top += loc[i][1] # here, we assume that the template top-left corner Y-coordinate is 0 (relatively to its original image)
            current_margin_bottom += (h2 - loc[i][1]) - h
            if(current_margin_top > max_margin_top): max_margin_top = current_margin_top
            if(current_margin_bottom > max_margin_bottom): max_margin_bottom = current_margin_bottom
            # we compute the width needed for the final result
            x_templ = int(float(w)*H_templ_ratio) # x-coordinate of the template relatively to its original image
            w_final += (w2 - x_templ - loc[i][0]) # width needed to stitch all the images into one mat
    
        h_final = h_init + max_margin_top + max_margin_bottom
        return (max_margin_top, h_final, w_final)
    
    # match each input image with its following image (1->2, 2->3) 
    def matchImages(imgs, templates_loc):
        for i in range(0,len(imgs)-1):
            template = genTemplate(imgs[i])
            template = mat2Edges(template)
            h_templ, w_templ = template.shape[:2]
            # Apply template Matching
            margin_top = margin_bottom = h_templ; margin_left = margin_right = 0
            img = addBlackMargins(imgs[i+1],margin_top, margin_bottom, margin_left, margin_right) # we need to enlarge the input image prior to call matchTemplate (template needs to be strictly smaller than the input image)
            img = mat2Edges(img)
            res = cv2.matchTemplate(img,template,cv2.TM_CCOEFF) # matching function
            _, _, _, templ_pos = cv2.minMaxLoc(res) # minMaxLoc gets the best match position
            # as we added margins to the input image we need to subtract the margins width to get the template position relatively to the initial input image (without the black margins)
            rectified_templ_pos = (templ_pos[0]-margin_left, templ_pos[1]-margin_top) 
            templates_loc.append(rectified_templ_pos)
            print("max_loc", rectified_templ_pos)
    
    def stitchImages(imgs, templates_loc):
        y_offset, h_final, w_final = calcFinalImgSize(imgs, templates_loc) # we calculate the "surface" needed to stitch all the images into one mat (and y_offset, the Y offset of the first image to be stitched) 
        result = np.zeros((h_final, w_final, 3), np.uint8)
    
        #initial stitch
        h_init, w_init = imgs[0].shape[:2]
        result[y_offset:y_offset+h_init, 0:w_init] = imgs[0]
        origin = (y_offset, 0) # top-left corner of the last stitched image (y,x)
        # stitching loop
        for j in range(0,len(templates_loc)):
            h, w = imgs[j].shape[:2]
            h2, w2 = imgs[j+1].shape[:2]
            # we compute the coordinates where to stitch imgs[j+1]
            y1 = origin[0] - templates_loc[j][1]
            y2 = origin[0] - templates_loc[j][1] + h2
            x_templ = int(float(w)*(1-H_templ_ratio)) # x-coordinate of the template relatively to its original image's right side
            x1 = origin[1] + x_templ - templates_loc[j][0]
            x2 = origin[1] + x_templ - templates_loc[j][0] + w2
            result[y1:y2, x1:x2] = imgs[j+1] # we copy the input image into the result mat
            origin = (y1,x1) # we update the origin point with the last stitched image
    
        return(result)
    
    if __name__ == '__main__':
    
        # input images
        part1 = cv2.imread('part1.jpg')
        part2 = cv2.imread('part2.jpg')
        part3 = cv2.imread('part3.jpg')
        imgs = [part1, part2, part3]
        
        H_templ_ratio = 0.45 # H_templ_ratio: horizontal ratio of the input that we will keep to create a template
        templates_loc = [] # templates location
    
        matchImages(imgs, templates_loc)
        
        result = stitchImages(imgs, templates_loc)
    
        cv2.imshow("result", result)