pythonopencvgoogle-vision

How to Crop Image Based on Google Vision API Bounding Poly Normalized Vertices using OpenCV for Python


I'm working on implementing the Google Vision Detect Multiple Objects API in Python (https://cloud.google.com/vision/docs/object-localizer)

The problem I'm having is that I don't know how to use the boundingPoly nomralizedVerticies that are returned in the response to determine how to crop the original image using OpenCV.

Example Response

{
  "responses": [
    {
      "localizedObjectAnnotations": [
        {
          "mid": "/m/0bt_c3",
          "name": "Book",
          "score": 0.8462029,
          "boundingPoly": {
            "normalizedVertices": [
              {
                "x": 0.1758254,
                "y": 0.046406608
              },
              {
                "x": 0.84299797,
                "y": 0.046406608
              },
              {
                "x": 0.84299797,
                "y": 0.9397349
              },
              {
                "x": 0.1758254,
                "y": 0.9397349
              }
            ]
          }
        }
      ]
    }
  ]
}

Update

So these are the coordinates that I'm working with.

points = [
          (
             0.17716026,
             0.04550384
          ),
          (
             0.8430133,
             0.04550384
          ),
          (
             0.8430133,
             0.9376166
          ),
          (
             0.17716026,
             0.9376166
          )
        ]

They are referring to this image. enter image description here

When I run my code using the answer provided by @MSS and use it to draw contours I get the below image.

from this import d
from pyimagesearch import imutils
from skimage import exposure
import numpy as np
import argparse
import cv2
from skimage.transform import rotate
from rembg import remove


ap = argparse.ArgumentParser()
ap.add_argument("-q", "--query", required = True,
    help = "Path to the query image")
args = vars(ap.parse_args())
image = cv2.imread(args["query"])
orig = image.copy()

IMAGE_SHAPE = image.shape
points = [
          (
             0.17716026,
             0.04550384
          ),
          (
             0.8430133,
             0.04550384
          ),
          (
             0.8430133,
             0.9376166
          ),
          (
             0.17716026,
             0.9376166
          )
        ]

coords = []
for point in points:
    pixels = tuple(round(coord * dimension) for coord, dimension in zip(point, IMAGE_SHAPE))
    coords.append(pixels)


points = np.array(coords)

cv2.drawContours(image, [points], -1, (0, 255, 0), 1) 
cv2.imshow("Image", image) 
cv2.waitKey(0)

This is the image that is output. So it appears as if the cropping is off. The cropped image that is output matches the contour as well.

enter image description here

You can see in this screenshot that it's indicating that it's finding the object correctly.

enter image description here

Update Ultimately the issue was that the image was flipped for some reason. I had to read in the IMAGE_SHAPE and do this.

IMAGE_SHAPE = image.shape[:2]
IMAGE_SHAPE = (IMAGE_SHAPE[1], IMAGE_SHAPE[0])

Solution

  • You have to unnormalize the coordinates based on the size of the original image in order to obtain the true coordinates.

    (number_of_rows, number_of_columns) = image.shape[:2]
    
    x_unormalized = round(x_normalized * number_of_rows)
    y_unnormalized = round(y_normalized * number_of_columns)
    
    ...
    
    cropped_image = image[y_unnormalized:y_unnormalized + h, x_unormalized:x_unormalized + w]
    

    This is by considering that the normalized values are obtained by:

    normalized_value = true_value/max(all_values)
    

    If some other normalization is applied, then you have to apply the inverse of that particular normalization.

    UPDATE:

    Here is the working code. I have tested it and it is working fine. I think you considered incorrect coordinate values.

    # from this import d
    # from pyimagesearch import imutils
    # import numpy as np
    # import argparse
    # from rembg import remove
    #from skimage import exposure
    #from skimage.transform import rotate
    import cv2
    
    
    image = cv2.imread("Path to image.jpg")
    orig = image.copy()
    
    (number_of_rows, number_of_columns) = image.shape[:2]
    points = [
              (
                 0.17716026,
                 0.04550384
              ),
              (
                 0.8430133,
                 0.04550384
              ),
              (
                 0.8430133,
                 0.9376166
              ),
              (
                 0.17716026,
                 0.9376166
              )
            ]
    
    first_point_y = round(points[0][0] * number_of_columns)
    first_point_x = round(points[0][1] * number_of_rows)
    second_point_y  = round(points[2][0] * number_of_columns)
    second_point_x = round(points[2][1] * number_of_rows)
    
    # coords = []
    # for point in points:
    #     pixels = tuple(round(coord * dimension) for coord, dimension in zip(point, IMAGE_SHAPE))
    #     coords.append(pixels)
    
    
    # points = np.array(coords)
    
    image = cv2.rectangle(image, (first_point_y, first_point_x), (second_point_y, second_point_x), (0, 255, 0), 1)
    # cv2.drawContours(image, [points], -1, (0, 255, 0), 1) 
    cv2.imshow("Image", image) 
    cv2.waitKey(0)
    

    Here is the output image:

    enter image description here