pythonopencvrubiks-cube

How do I split up thresholds into squares in OpenCV2?


I have a picture of a lovely Rubiks cube:

rubiks cube

I want to split it into squares and identify the colour of each square. I can run a Guassian Blur on it, followed by 'Canny' before ending on 'Dilate' in order to get the following:

post-dilation

This visibly looks good, but I'm unable to turn it into squares. Any sort of 'findContours' I try brings up only one or two squares. Nowhere near the nine I'm aiming for. Do people have any ideas on what I can do beyond this?

Current best solution:

sides

The code is below and requires numpy + opencv2. It expects a file called './sides/rubiks-side-F.png' and outputs several files to a 'steps' folder.

import numpy as np
import cv2 as cv

def save_image(name, file):
    return cv.imwrite('./steps/' + name + '.png', file)


def angle_cos(p0, p1, p2):
    d1, d2 = (p0-p1).astype('float'), (p2-p1).astype('float')
    return abs(np.dot(d1, d2) / np.sqrt(np.dot(d1, d1)*np.dot(d2, d2)))

def find_squares(img):
    img = cv.GaussianBlur(img, (5, 5), 0)
    squares = []
    for gray in cv.split(img):
        bin = cv.Canny(gray, 500, 700, apertureSize=5)
        save_image('post_canny', bin)
        bin = cv.dilate(bin, None)
        save_image('post_dilation', bin)
        for thrs in range(0, 255, 26):
            if thrs != 0:
                _retval, bin = cv.threshold(gray, thrs, 255, cv.THRESH_BINARY)
                save_image('threshold', bin)
            contours, _hierarchy = cv.findContours(
                bin, cv.RETR_LIST, cv.CHAIN_APPROX_SIMPLE)
            for cnt in contours:
                cnt_len = cv.arcLength(cnt, True)
                cnt = cv.approxPolyDP(cnt, 0.02*cnt_len, True)
                if len(cnt) == 4 and cv.contourArea(cnt) > 1000 and cv.isContourConvex(cnt):
                    cnt = cnt.reshape(-1, 2)
                    max_cos = np.max(
                        [angle_cos(cnt[i], cnt[(i+1) % 4], cnt[(i+2) % 4]) for i in range(4)])
                    if max_cos < 0.2:
                        squares.append(cnt)
    return squares

img = cv.imread("./sides/rubiks-side-F.png")
squares = find_squares(img)
cv.drawContours(img, squares, -1, (0, 255, 0), 3)
save_image('squares', img)

You can find other sides here


Solution

  • I know that you might not accept this answer because it is written in C++. That's ok; I just want to show you a possible approach for detecting the squares. I'll try to include as much detail as possible if you wish to port this code to Python.

    The goal is to detect all 9 squares, as accurately as possible. These are the steps:

    1. Get an edge mask where the outline of the complete cube is clear and visible.
    2. Filter these edges to get a binary cube (segmentation) mask.
    3. Use the cube mask to get the cube’s bounding box/rectangle.
    4. Use the bounding rectangle to get the dimensions and location of each square (all the squares have constant dimensions).

    First, I'll try to get an edges mask applying the steps you described. I just want to make sure I get to a similar starting point like where you currently are.

    The pipeline is this: read the image > grayscale conversion > Gaussian Blur > Canny Edge detector:

        //read the input image:
        std::string imageName = "C://opencvImages//cube.png";
        cv::Mat testImage =  cv::imread( imageName );
    
        //Convert BGR to Gray:
        cv::Mat grayImage;
        cv::cvtColor( testImage, grayImage, cv::COLOR_RGB2GRAY );
    
       //Apply Gaussian blur with a X-Y Sigma of 50:
        cv::GaussianBlur( grayImage, grayImage, cv::Size(3,3), 50, 50 );
    
        //Prepare edges matrix:
        cv::Mat testEdges;
    
        //Setup lower and upper thresholds for edge detection:
        float lowerThreshold = 20;
        float upperThreshold = 3 * lowerThreshold;
    
        //Get Edges via Canny:
        cv::Canny( grayImage, testEdges, lowerThreshold, upperThreshold );
    

    Alright, this is the starting point. This is the edges mask I get:

    Close to your results. Now, I'll apply a dilation. Here, the number of iterations of the operation is important, because I want nice, thick edges. Closing opened contours is also desired, so, I want an mild-aggressive dilation. I set the number of iterations = 5 using a rectangular structuring element.

        //Prepare a rectangular, 3x3 structuring element:
        cv::Mat SE = cv::getStructuringElement( cv::MORPH_RECT, cv::Size(3, 3) );
    
        //OP iterations:
        int dilateIterations = 5;
    
       //Prepare the dilation matrix:
        cv::Mat binDilation;
    
       //Perform the morph operation:
        cv::morphologyEx( testEdges, binDilation, cv::MORPH_DILATE, SE, cv::Point(-1,-1), dilateIterations );
    

    I get this:

    This is the output so far with nice and very defined edges. The most important thing is to clearly define the cube, because I'll rely on its outline to compute the bounding rectangle later.

    What follows is my attempt to clean the cube's edges from everything else as accurately as possible. There's a lot of garbage and pixels that do not belong to the cube, as you can see. I'm especially interested on flood-filling the background with a color (white) different from the cube (black) in order to get a nice segmentation.

    Flood-filling has a disadvantage, though. It can also fill the interior of a contour if it is not closed. I try to clean garbage and close contours in one go with a "border mask", which are just white lines at the side of the dilation mask.

    I implement this mask as four SUPER THICK lines that border the dilation mask. To apply the lines I need starting and ending points, which correspond to the image corners. These are defined in a vector:

        std::vector< std::vector<cv::Point> > imageCorners;
        imageCorners.push_back( { cv::Point(0,0), cv::Point(binDilation.cols,0) } );
        imageCorners.push_back( { cv::Point(binDilation.cols,0), cv::Point(binDilation.cols, binDilation.rows) } );
        imageCorners.push_back( { cv::Point(binDilation.cols, binDilation.rows), cv::Point(0,binDilation.rows) } );
        imageCorners.push_back( { cv::Point(0,binDilation.rows), cv::Point(0, 0) } );
    

    Four starting/ending coordinates in a vector of four entries. I apply the "border mask" looping through these coordinates and drawing the thick lines:

        //Define the SUPER THICKNESS:
        int lineThicness  = 200;
    
        //Loop through my line coordinates and draw four lines at the borders:
        for ( int c = 0 ; c < 4 ; c++ ){
            //Get current vector of points:
            std::vector<cv::Point> currentVect = imageCorners[c];
           //Get the starting/ending points:
            cv::Point startPoint = currentVect[0];
            cv::Point endPoint = currentVect[1];
            //Draw the line:
            cv::line( binDilation, startPoint, endPoint, cv::Scalar(255,255,255), lineThicness );
        }
    

    Cool. This gets me this output:

    Now, let's apply the floodFill algorithm. This operation will fill a closed area of same colored pixels with a "substitute" color. It needs a seed point and the substitute color (white in this case). Let's Flood-fill at the four corners inside of the white mask we just created.

        //Set the offset of the image corners. Ensure the area to be filled is black:
        int fillOffsetX = 200;
        int fillOffsetY = 200;
        cv::Scalar fillTolerance = 0; //No tolerance
        int fillColor = 255; //Fill color is white
       
        //Get the dimensions of the image:
        int targetCols = binDilation.cols;
        int targetRows = binDilation.rows;
    
        //Flood-fill at the four corners of the image:
        cv::floodFill( binDilation, cv::Point( fillOffsetX, fillOffsetY ), fillColor, (cv::Rect*)0, fillTolerance, fillTolerance);
        cv::floodFill( binDilation, cv::Point( fillOffsetX, targetRows - fillOffsetY ), fillColor, (cv::Rect*)0, fillTolerance, fillTolerance);
        cv::floodFill( binDilation, cv::Point( targetCols - fillOffsetX, fillOffsetY ), fillColor, (cv::Rect*)0, fillTolerance, fillTolerance);
        cv::floodFill( binDilation, cv::Point( targetCols - fillOffsetX, targetRows - fillOffsetY ), fillColor, (cv::Rect*)0, fillTolerance, fillTolerance);
    

    This can also be implemented as a loop, just like the "border mask". After this operation I get this mask:

    Getting close, right? Now, depending on your image, some garbage could survive all these "cleaning" operations. I'd suggest applying an area filter. The area filter will remove every blob of pixels that is under a threshold area. This is useful, because the cube's blobs are the biggest blobs on the mask and those surely will survive the area filter.

    Anyway, I'm just interested on the cube's outline; I don't need those lines inside the cube. I'm going to dilate the hell out of the (inverted) blob and then erode back to original dimensions to get rid of the lines inside the cube:

        //Get the inverted image:
        cv::Mat cubeMask = 255 - binDilation;
    
        //Set some really high iterations here:
        int closeIterations = 50;
    
        //Dilate
        cv::morphologyEx( cubeMask, cubeMask, cv::MORPH_DILATE, SE, cv::Point(-1,-1), closeIterations );
        //Erode:
        cv::morphologyEx( cubeMask, cubeMask, cv::MORPH_ERODE, SE, cv::Point(-1,-1), closeIterations );
    

    This is a closing operation. And a pretty brutal one, this is the result of applying it. Remember I previously inverted the image:

    Isn't that nice or what? Check out the cube mask, here overlaid into the original RBG image:

    Excellent, now let's get the bounding box of this blob. The approach is as follows:

    Get blob contour > Convert contour to bounding box
    

    This is fairly straightforward to implement, and the Python equivalent should be very similar to this. First, get the contours via findContours. As you see, there should be only one contour: the cube outline. Next, convert the contour to a bounding rectangle using boundingRect. In C++ this is the code:

        //Lets get the blob contour:
        std::vector< std::vector<cv::Point> > contours;
        std::vector<cv::Vec4i> hierarchy;
    
        cv::findContours( cubeMask, contours, hierarchy, CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE, cv::Point(0, 0) );
    
        //There should be only one contour, the item number 0:
        cv::Rect boundigRect = cv::boundingRect( contours[0] );
    

    These are the contours found (just one):

    Once you convert this contour to a bounding rectangle, you can get this nice image:

    Ah, we are very close to the end here. As all the squares have the same dimensions and your image doesn’t seem to be very perspective-distorted, we can use the bounding rectangle to estimate the square dimensions. All the squares have the same width and height, there are 3 squares per cube width and 3 squares per cube height.

    Divide the bounding rectangle in 9 equal sub-squares (or, as I call them, "grids") and get their dimensions and location starting from the coordinates of the bounding box, like this:

        //Number of squares or "grids"
        int verticalGrids = 3;
        int horizontalGrids = 3;
    
        //Grid dimensions:
        float gridWidth = (float)boundigRect.width / 3.0;
        float gridHeight = (float)boundigRect.height / 3.0;
    
        //Grid counter:
        int gridCounter = 1;
        
        //Loop thru vertical dimension:
        for ( int j = 0; j < verticalGrids; ++j ) {
    
            //Grid starting Y:
            int yo = j * gridHeight;
    
            //Loop thru horizontal dimension:
            for ( int i = 0; i < horizontalGrids; ++i ) {
    
                //Grid starting X:
                int xo = i * gridWidth;
                
                //Grid dimensions:
                cv::Rect gridBox;
                gridBox.x = boundigRect.x + xo;
                gridBox.y = boundigRect.y + yo;
                gridBox.width = gridWidth;
                gridBox.height = gridHeight;
    
                //Draw a rectangle using the grid dimensions:
                cv::rectangle( testImage, gridBox, cv::Scalar(0,0,255), 5 );
    
                //Int to string:
                std::string gridCounterString = std::to_string( gridCounter );
    
                //String position:
                cv::Point textPosition;
                textPosition.x = gridBox.x + 0.5 * gridBox.width;
                textPosition.y = gridBox.y + 0.5 * gridBox.height;
    
                //Draw string:
                cv::putText( testImage, gridCounterString, textPosition, cv::FONT_HERSHEY_SIMPLEX,
                             1, cv::Scalar(255,0,0), 3, cv::LINE_8, false );
    
                gridCounter++;
    
            }
    
        }
    

    Here, for each grid, I'm drawing its rectangle and a nice number at the center of it. The draw rectangle function requires a defined rectangle: Upper left starting coordinates and rectangle width and height, which are defined using the gridBox variable of cv::Rect type.

    Here's a cool animation of how the cube gets divided into 9 grids:

    Here’s the final image!

    Some suggestions:

    1. Your source image is way too big, try resizing it to a smaller size, operate
      on it and scale back the results.
    2. Implement the area filter. It is very handy in getting rid of small blobs of pixels.
    3. Depending on your images (I just tested the once you posted in your question) and the perspective distortion introduced by the camera, a simple contour to boundingRect might not be enough. In that case, another approach would be to get the four points of the cube outline via Hough line detection.