image-processinggeometrycomputer-visionreverseprojectionprojective-geometry

proportions of a perspective-deformed rectangle


Given a 2d picture of a rectangle distorted by perspective:

enter image description here

I know that the shape was originally a rectangle, but I do not know its original size.

If I know the pixel coordinates of the corners in this picture, how can I calculate the original proportions, i.e. the quotient ( width / height ) of the rectangle?

(background: the goal is to automatically undistort photos of rectangular documents, edge detection will probably be done with hough transform)

UPDATE:

There has been some discussion on whether it is possible at all to determine the width:height ratio with the information given. My naive thought was that it must be possible, since I can think of no way to project for example a 1:4 rectangle onto the quadrangle depicted above. The ratio appears clearly close to 1:1, so there should be a way to determine it mathematically. I have however no proof for this beyond my intuitive guess.

I have not yet fully understood the arguments presented below, but I think there must be some implicit assumption that we are missing here and that is interpreted differently.

However, after hours of searching, I have finally found some papers relevant to the problem. I am struggling to understand the math used in there, so far without success. Particularly the first paper seems to discuss exactly what I wanted to do, unfortunately without code examples and very dense math.


Solution

  • Here is my attempt at answering my question after reading the paper

    I manipulated the equations for some time in SAGE, and came up with this pseudo-code in c-style:

    
    // in case it matters: licensed under GPLv2 or later
    // legend:
    // sqr(x)  = x*x
    // sqrt(x) = square root of x
    
    // let m1x,m1y ... m4x,m4y be the (x,y) pixel coordinates
    // of the 4 corners of the detected quadrangle
    // i.e. (m1x, m1y) are the cordinates of the first corner, 
    // (m2x, m2y) of the second corner and so on.
    // let u0, v0 be the pixel coordinates of the principal point of the image
    // for a normal camera this will be the center of the image, 
    // i.e. u0=IMAGEWIDTH/2; v0 =IMAGEHEIGHT/2
    // This assumption does not hold if the image has been cropped asymmetrically
    
    // first, transform the image so the principal point is at (0,0)
    // this makes the following equations much easier
    m1x = m1x - u0;
    m1y = m1y - v0;
    m2x = m2x - u0;
    m2y = m2y - v0;
    m3x = m3x - u0;
    m3y = m3y - v0;
    m4x = m4x - u0;
    m4y = m4y - v0;
    
    
    // temporary variables k2, k3
    double k2 = ((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x) /
                ((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) ;
    
    double k3 = ((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x) / 
                ((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) ;
    
    // f_squared is the focal length of the camera, squared
    // if k2==1 OR k3==1 then this equation is not solvable
    // if the focal length is known, then this equation is not needed
    // in that case assign f_squared= sqr(focal_length)
    double f_squared = 
        -((k3*m3y - m1y)*(k2*m2y - m1y) + (k3*m3x - m1x)*(k2*m2x - m1x)) / 
                          ((k3 - 1)*(k2 - 1)) ;
    
    //The width/height ratio of the original rectangle
    double whRatio = sqrt( 
        (sqr(k2 - 1) + sqr(k2*m2y - m1y)/f_squared + sqr(k2*m2x - m1x)/f_squared) /
        (sqr(k3 - 1) + sqr(k3*m3y - m1y)/f_squared + sqr(k3*m3x - m1x)/f_squared) 
    ) ;
    
    // if k2==1 AND k3==1, then the focal length equation is not solvable 
    // but the focal length is not needed to calculate the ratio.
    // I am still trying to figure out under which circumstances k2 and k3 become 1
    // but it seems to be when the rectangle is not distorted by perspective, 
    // i.e. viewed straight on. Then the equation is obvious:
    if (k2==1 && k3==1) whRatio = sqrt( 
        (sqr(m2y-m1y) + sqr(m2x-m1x)) / 
        (sqr(m3y-m1y) + sqr(m3x-m1x))
    
    
    // After testing, I found that the above equations 
    // actually give the height/width ratio of the rectangle, 
    // not the width/height ratio. 
    // If someone can find the error that caused this, 
    // I would be most grateful.
    // until then:
    whRatio = 1/whRatio;
    

    Update: here is how these equations were determined:

    The following is code in SAGE. It can be accessed online at http://www.sagenb.org/home/pub/704/. (Sage is really useful in solving equations, and useable in any browser, check it out)

    # CALCULATING THE ASPECT RATIO OF A RECTANGLE DISTORTED BY PERSPECTIVE
    
    #
    # BIBLIOGRAPHY:
    # [zhang-single]: "Single-View Geometry of A Rectangle 
    #  With Application to Whiteboard Image Rectification"
    #  by Zhenggyou Zhang
    #  http://research.microsoft.com/users/zhang/Papers/WhiteboardRectification.pdf
    
    # pixel coordinates of the 4 corners of the quadrangle (m1, m2, m3, m4)
    # see [zhang-single] figure 1
    m1x = var('m1x')
    m1y = var('m1y')
    m2x = var('m2x')
    m2y = var('m2y')
    m3x = var('m3x')
    m3y = var('m3y')
    m4x = var('m4x')
    m4y = var('m4y')
    
    # pixel coordinates of the principal point of the image
    # for a normal camera this will be the center of the image, 
    # i.e. u0=IMAGEWIDTH/2; v0 =IMAGEHEIGHT/2
    # This assumption does not hold if the image has been cropped asymmetrically
    u0 = var('u0')
    v0 = var('v0')
    
    # pixel aspect ratio; for a normal camera pixels are square, so s=1
    s = var('s')
    
    # homogenous coordinates of the quadrangle
    m1 = vector ([m1x,m1y,1])
    m2 = vector ([m2x,m2y,1])
    m3 = vector ([m3x,m3y,1])
    m4 = vector ([m4x,m4y,1])
    
    
    # the following equations are later used in calculating the the focal length 
    # and the rectangle's aspect ratio.
    # temporary variables: k2, k3, n2, n3
    
    # see [zhang-single] Equation 11, 12
    k2_ = m1.cross_product(m4).dot_product(m3) / m2.cross_product(m4).dot_product(m3)
    k3_ = m1.cross_product(m4).dot_product(m2) / m3.cross_product(m4).dot_product(m2)
    k2 = var('k2')
    k3 = var('k3')
    
    # see [zhang-single] Equation 14,16
    n2 = k2 * m2 - m1
    n3 = k3 * m3 - m1
    
    
    # the focal length of the camera.
    f = var('f')
    # see [zhang-single] Equation 21
    f_ = sqrt(
             -1 / (
              n2[2]*n3[2]*s^2
             ) * (
              (
               n2[0]*n3[0] - (n2[0]*n3[2]+n2[2]*n3[0])*u0 + n2[2]*n3[2]*u0^2
              )*s^2 + (
               n2[1]*n3[1] - (n2[1]*n3[2]+n2[2]*n3[1])*v0 + n2[2]*n3[2]*v0^2
              ) 
             ) 
            )
    
    
    # standard pinhole camera matrix
    # see [zhang-single] Equation 1
    A = matrix([[f,0,u0],[0,s*f,v0],[0,0,1]])
    
    
    #the width/height ratio of the original rectangle
    # see [zhang-single] Equation 20
    whRatio = sqrt (
                   (n2*A.transpose()^(-1) * A^(-1)*n2.transpose()) / 
                   (n3*A.transpose()^(-1) * A^(-1)*n3.transpose())
                  ) 
    

    The simplified equations in the c-code where determined by

    print "simplified equations, assuming u0=0, v0=0, s=1"
    print "k2 := ", k2_
    print "k3 := ", k3_
    print "f  := ", f_(u0=0,v0=0,s=1)
    print "whRatio := ", whRatio(u0=0,v0=0,s=1)
    
        simplified equations, assuming u0=0, v0=0, s=1
        k2 :=  ((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y
        - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
        k3 :=  ((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)/((m3y
        - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x)
        f  :=  sqrt(-((k3*m3y - m1y)*(k2*m2y - m1y) + (k3*m3x - m1x)*(k2*m2x
        - m1x))/((k3 - 1)*(k2 - 1)))
        whRatio :=  sqrt(((k2 - 1)^2 + (k2*m2y - m1y)^2/f^2 + (k2*m2x -
        m1x)^2/f^2)/((k3 - 1)^2 + (k3*m3y - m1y)^2/f^2 + (k3*m3x -
        m1x)^2/f^2))
    
    print "Everything in one equation:"
    print "whRatio := ", whRatio(f=f_)(k2=k2_,k3=k3_)(u0=0,v0=0,s=1)
    
        Everything in one equation:
        whRatio :=  sqrt(((((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
        m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
        1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y -
        m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y -
        m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x
        - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - m1y)^2/((((m1y - m4y)*m2x -
        (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x -
        m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x -
        m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y
        + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y +
        m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y
        - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
        m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
        - m1x)) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
        m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
        1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y -
        m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y -
        m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)*m2x/((m2y - m4y)*m3x
        - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - m1x)^2/((((m1y - m4y)*m2x -
        (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x -
        m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x -
        m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y
        + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y +
        m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y
        - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
        m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
        - m1x)) - (((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
        m1y*m4x)/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) -
        1)^2)/((((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
        m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
        1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y -
        m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y -
        m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x
        - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)^2/((((m1y - m4y)*m2x -
        (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x -
        m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x -
        m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y
        + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y +
        m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y
        - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
        m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
        - m1x)) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
        m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
        1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y -
        m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y -
        m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x
        - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - m1x)^2/((((m1y - m4y)*m2x -
        (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x -
        m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x -
        m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y
        + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y +
        m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y
        - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
        m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
        - m1x)) - (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
        m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
        1)^2))
    
    
    
    # some testing:
    # - choose a random rectangle, 
    # - project it onto a random plane,
    # - insert the corners in the above equations,
    # - check if the aspect ratio is correct.
    
    from sage.plot.plot3d.transform import rotate_arbitrary
    
    #redundandly random rotation matrix
    rand_rotMatrix = \
               rotate_arbitrary((uniform(-5,5),uniform(-5,5),uniform(-5,5)),uniform(-5,5)) *\
               rotate_arbitrary((uniform(-5,5),uniform(-5,5),uniform(-5,5)),uniform(-5,5)) *\
               rotate_arbitrary((uniform(-5,5),uniform(-5,5),uniform(-5,5)),uniform(-5,5))
    
    #random translation vector
    rand_transVector = vector((uniform(-10,10),uniform(-10,10),uniform(-10,10))).transpose()
    
    #random rectangle parameters
    rand_width =uniform(0.1,10)
    rand_height=uniform(0.1,10)
    rand_left  =uniform(-10,10)
    rand_top   =uniform(-10,10)
    
    #random focal length and principal point
    rand_f  = uniform(0.1,100)
    rand_u0 = uniform(-100,100)
    rand_v0 = uniform(-100,100)
    
    # homogenous standard pinhole projection, see [zhang-single] Equation 1
    hom_projection = A * rand_rotMatrix.augment(rand_transVector)
    
    # construct a random rectangle in the plane z=0, then project it randomly 
    rand_m1hom = hom_projection*vector((rand_left           ,rand_top            ,0,1)).transpose()
    rand_m2hom = hom_projection*vector((rand_left           ,rand_top+rand_height,0,1)).transpose()
    rand_m3hom = hom_projection*vector((rand_left+rand_width,rand_top            ,0,1)).transpose()
    rand_m4hom = hom_projection*vector((rand_left+rand_width,rand_top+rand_height,0,1)).transpose()
    
    #change type from 1x3 matrix to vector
    rand_m1hom = rand_m1hom.column(0)
    rand_m2hom = rand_m2hom.column(0)
    rand_m3hom = rand_m3hom.column(0)
    rand_m4hom = rand_m4hom.column(0)
    
    #normalize
    rand_m1hom = rand_m1hom/rand_m1hom[2]
    rand_m2hom = rand_m2hom/rand_m2hom[2]
    rand_m3hom = rand_m3hom/rand_m3hom[2]
    rand_m4hom = rand_m4hom/rand_m4hom[2]
    
    #substitute random values for f, u0, v0
    rand_m1hom = rand_m1hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0)
    rand_m2hom = rand_m2hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0)
    rand_m3hom = rand_m3hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0)
    rand_m4hom = rand_m4hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0)
    
    # printing the randomly choosen values
    print "ground truth: f=", rand_f, "; ratio=", rand_width/rand_height
    
    # substitute all the variables in the equations:
    print "calculated: f= ",\
    f_(k2=k2_,k3=k3_)(s=1,u0=rand_u0,v0=rand_v0)(
      m1x=rand_m1hom[0],m1y=rand_m1hom[1],
      m2x=rand_m2hom[0],m2y=rand_m2hom[1],
      m3x=rand_m3hom[0],m3y=rand_m3hom[1],
      m4x=rand_m4hom[0],m4y=rand_m4hom[1],
    ),"; 1/ratio=", \
    1/whRatio(f=f_)(k2=k2_,k3=k3_)(s=1,u0=rand_u0,v0=rand_v0)(
      m1x=rand_m1hom[0],m1y=rand_m1hom[1],
      m2x=rand_m2hom[0],m2y=rand_m2hom[1],
      m3x=rand_m3hom[0],m3y=rand_m3hom[1],
      m4x=rand_m4hom[0],m4y=rand_m4hom[1],
    )
    
    print "k2 = ", k2_(
      m1x=rand_m1hom[0],m1y=rand_m1hom[1],
      m2x=rand_m2hom[0],m2y=rand_m2hom[1],
      m3x=rand_m3hom[0],m3y=rand_m3hom[1],
      m4x=rand_m4hom[0],m4y=rand_m4hom[1],
    ), "; k3 = ", k3_(
      m1x=rand_m1hom[0],m1y=rand_m1hom[1],
      m2x=rand_m2hom[0],m2y=rand_m2hom[1],
      m3x=rand_m3hom[0],m3y=rand_m3hom[1],
      m4x=rand_m4hom[0],m4y=rand_m4hom[1],
    )
    
    # ATTENTION: testing revealed, that the whRatio 
    # is actually the height/width ratio, 
    # not the width/height ratio
    # This contradicts [zhang-single]
    # if anyone can find the error that caused this, I'd be grateful
    
        ground truth: f= 72.1045134124554 ; ratio= 3.46538779959142
        calculated: f=  72.1045134125 ; 1/ratio= 3.46538779959
        k2 =  0.99114614987 ; k3 =  1.57376280159