javaopencvface-detectionhaar-classifiereye-detection

Face Features Detection Using OpenCV Haar-cascades


I am using Java with OpenCV Library to detect Face,Eyes and Mouth using Laptop Camera.

What I have done so far:

  1. Capture Video Frames using VideoCapture object.
  2. Detect Face using Haar-Cascades.
  3. Divide the Face region into Top Region and Bottom Region.
  4. Search for Eyes inside Top region.
  5. Search for Mouth inside Bottom region.

Problem I am facing:

Main Questions:

  1. Do Higher Cameras' Resolutions work better for Haar-Cascades?

  2. Do I have to capture Video Frames in a certain scale? for example (100px X100px)?

  3. Do Haar-Cascades work better in Gray-scale Images?

  4. Does different lighting conditions make difference?

  5. What does the method detectMultiScale(params) exactly do?

  6. If I want to go for further analysis for Eye Blinking, Eye Closure Duration, Mouth Yawning, Head Nodding and Head Orientation to Detect Fatigue (Drowsiness) By Using Support Vector Machine, any advices?

Your help is appreciated!


Solution

  • The following article, would give you an overview of the things going under the hood, I would highly recommend to read the article.

    Do Higher Cameras' Resolutions work better for Haar-Cascades?

    Not necessarily, the cascade.detectMultiScale has params to adjust for various input width, height scenarios, like minSize and maxSize, These are optional params However, But you can tweak these to get robust predictions if you have control over the input image size. If you set the minSize to smaller value and ignore maxSize then it will work for smaller and high res images as well, but the performance would suffer. Also if you imagine now, How come there is no differnce between High-res and low-res images then you should consider that the cascade.detectMultiScale internally scales the images to lower resolutions for performance boost, that is why defining the maxSize and minSize is important to avoid any unnecessary iterations.

    Do I have to capture Video Frames in a certain scale? for example (100px X100px)

    This mainly depends upon the params you pass to the cascade.detectMultiScale. Personally I guess that 100 x 100 would be too small for smaller face detection in the frame as some features would be completely lost while resizing the frame to smaller dimensions, and the cascade.detectMultiScale is highly dependent upon the gradients or features in the input image.

    But if the input frame only has face as a major part, and there are no other smaller faces dangling behind then you may use 100 X 100. I have tested some sample faces of size 100 x 100 and it worked pretty well. And if this is not the case then 300 - 400 px width should work good. However you would need to tune the params in order to achieve accuracy.

    Do Haar-Cascades work better in Gray-scale Images?

    They work only in gray-scale images.

    In the article, if you read the first part, you will come to know that it face detection is comprised of detecting many binary patterns in the image, This basically comes from the ViolaJones, paper which is the basic of this algorithm.

    Does different lighting conditions make difference?

    May be in some cases, largely Haar-features are lighting invariant.

    If you are considering different lighting conditions as taking images under green or red light, then it may not affect the detection, The haar-features (since dependent on gray-scale) are independent of the RGB color of input image. The detection mainly depends upon the gradients/features in the input image. So as far as there are enough gradient differences in the input image such as eye-brow has lower intensity than fore-head, etc. it will work fine.

    But consider a case when input image has back-light or very low ambient light, In that case it may be possible that some prominent features are not found, which may result in face not detected.

    What does the method detectMultiScale(params) exactly do?

    I guess, if you have read the article, by this time, then you must be knowing it well.

    If I want to go for further analysis for Eye Blinking, Eye Closure Duration, Mouth Yawning, Head Nodding and Head Orientation to Detect Fatigue (Drowsiness) By Using Support Vector Machine, any advices?

    No, I won't suggest you to perform these type of gesture detection with SVM, as it would be extremely slow to run 10 different cascades to conclude current facial state, However I would recommend you to use some Facial Landmark Detection Framework, such as Dlib, You may search for some other frameworks as well, because the model size of dlib is nearly 100MB and it may not suit your needs i f you want to port it to mobile device. So the key is ** Facial Landmark Detection **, once you get the full face labelled, you can draw conclusions like if the mouth if open or the eyes are blinking, and it works in Real-time, so your video processing won't suffer much.