Looking generate_anchor_base
method, which is Faster R-CNN util method in ChainerCV.
What is the base_size = 16
? I saw in the Documentation that it is
The width and the height of the reference window.
But what does "reference window" mean?
Also it says that anchor_scales=[8, 16, 32]
are the areas of the anchors but I thought that that the areas are (128, 256, 512)
Another question:
If the base size
is 16 and h = 128
and w=128
, Does that mean anchor_base[index, 0] = py - h / 2
is a negative value?
since py = 8 and and h/2 = 128/2
The method is a util function of Faster R-CNN, so I assume you understood what is the "anchor" proposed in Faster R-CNN.
base_size
and anchor_scales
determines the size of the anchor.
For example, when base_size=16
and anchor_scales=[8, 16, 32]
(and ratio=1.0
), height and width of the anchor will be 16 * [8, 16, 32] = (128, 256, 512)
, as you expected.
ratio
determines the height and width aspect ratio.
(I might be wrong in below paragraph, please correct if I'm wrong.)
I think base_size
need to be set as the size of the current hidden layer's scale. In the chainercv
Faster R-CNN implementation, extractor
's feature is fed into rpn
(region proposal network) and generate_anchor_base
is used in rpn
. So you need to take care what is the feature of extractor
's output. chainercv
uses VGG16 as the feature extractor, and conv5_3
layer is used as extracted feature (see here), this layer is a place where max_pooling_2d
is applied 4 times, which results 2^4=16 times smallen feature.
For the another question, I think your understanding is correct, py - h / 2
will be negative value. But this anchor_base
value is just a relative value. Once anchor_base
is prepared at the initialization of model (here), actual (absolute value) anchor
is created in each forward call (here) in _enumerate_shifted_anchor
method.