opencvtrackingdlibface

How do you transform 2D facial landmarks into 3D world coordinates?


First post here, I'll do my best to describe the problem. I am trying to animate the face of a 3D virtual character in realtime using opencv, dlib, and webcam data similar to fancy applications such as this software

Following the example here and using a webcam, I can get realtime 2D facial landmark data in screen space and also an estimation of the 3D translation data of the head pose. But what I would really like to know is the estimated 3D world coordinates of the facial landmarks. For example, when the head is tilted 30 degrees to the side, the mouth appears the same as if the actor was making a "w" sound with his/her mouth.

Can anyone tell me what the strategy is for turning these 2d facial landmarks into 3D data using the estimated head pose, or is this what makes these realtime face animation applications so expensive?


std::vector<cv::Point3d> object_pts;
std::vector<cv::Point2d> image_pts;
std::vector<cv::Point3d> reprojectsrc;
std::vector<cv::Point2d> reprojectdst;
cv::Mat rotation_vec;
cv::Mat rotation_mat;
cv::Mat translation_vec;
cv::Mat pose_mat;
cv::Mat euler_angle;
cv::Mat cam_matrix;
cv::Mat dist_coeffs;
cv::Mat out_intrinsics;
cv::Mat out_rotation;
cv::Mat out_translation;

int UpdateFace(dlib::cv_image<dlib::bgr_pixel> cimg, dlib::rectangle face)
{
    // get facial landmarks
    shape = predictor(cimg, face);

    // draw facial landmarks
    for (unsigned int i = 0; i < 68; ++i)
    {
        cv::circle(frame, cv::Point(shape.part(i).x(), shape.part(i).y()), 2, cv::Scalar(0, 0, 255), -1);
    }

    // clear data and fill in 2D ref points
    image_pts.clear();
    //#17 left brow left corner
    image_pts.push_back(cv::Point2d(shape.part(17).x(), shape.part(17).y())); 
    //#21 left brow right corner
    image_pts.push_back(cv::Point2d(shape.part(21).x(), shape.part(21).y())); 
    //#22 right brow left corner
    image_pts.push_back(cv::Point2d(shape.part(22).x(), shape.part(22).y())); 
    //#26 right brow right corner
    image_pts.push_back(cv::Point2d(shape.part(26).x(), shape.part(26).y())); 
    //#36 left eye left corner
    image_pts.push_back(cv::Point2d(shape.part(36).x(), shape.part(36).y())); 
    //#39 left eye right corner
    image_pts.push_back(cv::Point2d(shape.part(39).x(), shape.part(39).y())); 
    //#42 right eye left corner
    image_pts.push_back(cv::Point2d(shape.part(42).x(), shape.part(42).y())); 
    //#45 right eye right corner
    image_pts.push_back(cv::Point2d(shape.part(45).x(), shape.part(45).y())); 
    //#31 nose left corner
    image_pts.push_back(cv::Point2d(shape.part(31).x(), shape.part(31).y())); 
    //#35 nose right corner
    image_pts.push_back(cv::Point2d(shape.part(35).x(), shape.part(35).y())); 
    //#48 mouth left corner
    image_pts.push_back(cv::Point2d(shape.part(48).x(), shape.part(48).y())); 
    //#54 mouth right corner
    image_pts.push_back(cv::Point2d(shape.part(54).x(), shape.part(54).y())); 
    //#57 mouth central bottom corner
    image_pts.push_back(cv::Point2d(shape.part(57).x(), shape.part(57).y())); 
    //#8 chin corner
    image_pts.push_back(cv::Point2d(shape.part(8).x(), shape.part(8).y()));   

    // calculate the head pose
    cv::solvePnP(object_pts, image_pts, cam_matrix, dist_coeffs, rotation_vec, translation_vec);

    // reproject
    cv::projectPoints(reprojectsrc, rotation_vec, translation_vec, cam_matrix, dist_coeffs, reprojectdst);

    //draw 3d box around the actors head
    cv::line(frame, reprojectdst[0], reprojectdst[1], cv::Scalar(0, 0, 255));
    cv::line(frame, reprojectdst[1], reprojectdst[2], cv::Scalar(0, 0, 255));
    cv::line(frame, reprojectdst[2], reprojectdst[3], cv::Scalar(0, 0, 255));
    cv::line(frame, reprojectdst[3], reprojectdst[0], cv::Scalar(0, 0, 255));
    cv::line(frame, reprojectdst[4], reprojectdst[5], cv::Scalar(0, 0, 255));
    cv::line(frame, reprojectdst[5], reprojectdst[6], cv::Scalar(0, 0, 255));
    cv::line(frame, reprojectdst[6], reprojectdst[7], cv::Scalar(0, 0, 255));
    cv::line(frame, reprojectdst[7], reprojectdst[4], cv::Scalar(0, 0, 255));
    cv::line(frame, reprojectdst[0], reprojectdst[4], cv::Scalar(0, 0, 255));
    cv::line(frame, reprojectdst[1], reprojectdst[5], cv::Scalar(0, 0, 255));
    cv::line(frame, reprojectdst[2], reprojectdst[6], cv::Scalar(0, 0, 255));
    cv::line(frame, reprojectdst[3], reprojectdst[7], cv::Scalar(0, 0, 255));

    //calculate euler angle to move avatars head
    cv::Rodrigues(rotation_vec, rotation_mat);
    cv::hconcat(rotation_mat, translation_vec, pose_mat);
    cv::decomposeProjectionMatrix(pose_mat, out_intrinsics, out_rotation, 
        out_translation, cv::noArray(), cv::noArray(), cv::noArray(), euler_angle);

    // calculate 3D transformation of facial landmarks to animate avatars facial features
    // code needed here ...
}

Solution

  • It is not an easy problem to solve.

    You can look at 3D morphable models as the starting point.

    https://cvssp.org/faceweb/3dmm/

    You can then look at all recent papers that reference that paper on Google Scholar to find the state of the art.

    Hope that helps.