algorithmopencvface-detectionlbph-algorithm

Understanding OpenCV LBP implementation


I need some help on LBP based face detection and that is why I am writing this.

I have the following questions related to face detection implemented on OpenCV:

  1. In lbpCascade_frontal_face.xml (this is from opencv): what is internalNodes, leafValues, tree,features etc? I know they are used in the algorithm . But I do not understand the meaning of each one of them. For example, why we take a particular feature and not the other for a particular stage? how we are deciding which feature/ node to choose?
  2. What is feature values in the LBP_frontal_face_classifier.xml? I know they are a vector of 4 integers. But how should I use this features? I thought stage 0 access to the first feature but access is not in this pattern. What is the access pattern to this features?

  3. All the papers in literature give a high level overview only. Their descriptions mainly consist of LBP calculation from neighborhood pixels. But how this LBP values is used against those elements in the classifier?

  4. How does integral image help in calculating LBP value of a pixel? I know how HAAR is used. I need to understand LBP.

I read some papers, articles. But none clearly describes how LBP based face detection works or the algorithm in details. If someone wants to develop a face detection program on his own,what are the steps he should follow- no document describes that.

Please help me on these if you could. I would be grateful.


Solution

  • I refer you to my own answer from the past which lightly touches on the topic, but didn't explain the XML cascade format.

    Let's look at a fake, modified for clarity example of a cascade with only a single stage, and three features.

    <!-- stage 0 -->
    <_>
      <maxWeakCount>3</maxWeakCount>
      <stageThreshold>-0.75</stageThreshold>
      <weakClassifiers>
        <!-- tree 0 -->
        <_>
          <internalNodes>
            0 -1 3 -67130709 -21569 -1426120013 -1275125205 -21585
            -16385 587145899 -24005</internalNodes>
          <leafValues>
            -0.65 0.88</leafValues></_>
        <!-- tree 1 -->
        <_>
          <internalNodes>
            0 -1 0 -163512766 -769593758 -10027009 -262145 -514457854
            -193593353 -524289 -1</internalNodes>
          <leafValues>
            -0.77 0.72</leafValues></_>
        <!-- tree 2 -->
        <_>
          <internalNodes>
            0 -1 2 -363936790 -893203669 -1337948010 -136907894
            1088782736 -134217726 -741544961 -1590337</internalNodes>
          <leafValues>
            -0.71 0.68</leafValues></_></weakClassifiers></_>
    

    Somewhat later....

    <features>
      <_>
        <rect>
          0 0 3 5</rect></_>
      <_>
        <rect>
          0 0 4 2</rect></_>
      <_>
        <rect>
          0 0 6 3</rect></_>
      <_>
        <rect>
          0 1 4 3</rect></_>
      <_>
          <rect>
          0 1 3 3</rect></_>
    

    ...

    Let us look first at the tags of a stage:

    Turning to the tags describing an LBP feature:

    Lastly, the <feature> tag. It consists of an array of <rect> tags which contain 4 integers describing the geometry of the feature. Given a processing window (24x24 in your case), the first two integers describe its x and y integer pixel offset within the processing window, and the next two integers describe the width and height of one subrectangle out of the 9 that are needed for the LBP feature to be evaluated.

    In essence then, a tag <rect> ft.x ft.y ft.width ft.height </rect> situated within a processing window pW.widthxpW.height checking whether a face is present at pW.xxpW.y corresponds to...

    https://i.sstatic.net/NL0XX.png

    To evaluate the LBP then, it suffices to read the integral image at points p[0..15] and use p[BR]+p[TL]-p[TR]-p[BL] to compute the integral of the nine subrectangles. The central subrectangle, R4, is compared that of the eight others, clockwise starting from R0, to produce an 8-bit LBP (the bits are packed [msb 01258763 lsb]).

    This 8-bit LBP is then used as an index into the feature's (2^8 = 256)-bit LUT (the <internalNodes>), selecting a single bit. If this bit is 1, the feature is inconsistent with a face; if 0, it is consistent with a face. The appropriate weight (<leafNode>) is then returned and added with the weights of all other features to produce an overall stage sum. This is then compared to <stageThreshold> to determine whether the stage passed or failed.

    If there's something else I didn't explain well enough I can clarify.