In my case, I would like to extract and visualize the features output in layers 102, 103, 104 in the following code in cfg/training/yolov7.yaml
.
# yolov7 head
head:
[[-1, 1, SPPCSPC, [512]], # 51
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[37, 1, Conv, [256, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]], # 63
[-1, 1, Conv, [128, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[24, 1, Conv, [128, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]], # 75
[-1, 1, MP, []],
[-1, 1, Conv, [128, 1, 1]],
[-3, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 2]],
[[-1, -3, 63], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]], # 88
[-1, 1, MP, []],
[-1, 1, Conv, [256, 1, 1]],
[-3, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 2]],
[[-1, -3, 51], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]], # 101
[75, 1, RepConv, [256, 3, 1]], #extract
[88, 1, RepConv, [512, 3, 1]], #extract
[101, 1, RepConv, [1024, 3, 1]], #extract
[[102,103,104], 1, IDetect, [nc, anchors]], # Detect(P3, P4, P5)
]
Also, the following is the result of printing out the model.
Model(
(model): Sequential(
(0): Conv(
(conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(1): Conv(
(conv): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(act): SiLU(inplace=True)
)
(2): Conv(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
----------------------------------------------------
(102): RepConv(
(act): SiLU(inplace=True)
(rbr_reparam): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) # extract
)
(103): RepConv(
(act): SiLU(inplace=True)
(rbr_reparam): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) # extract
)
(104): RepConv(
(act): SiLU(inplace=True)
(rbr_reparam): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) # extract
)
(105): IDetect(
(m): ModuleList(
(0): Conv2d(256, 21, kernel_size=(1, 1), stride=(1, 1))
(1): Conv2d(512, 21, kernel_size=(1, 1), stride=(1, 1))
(2): Conv2d(1024, 21, kernel_size=(1, 1), stride=(1, 1))
)
(ia): ModuleList(
(0): ImplicitA()
(1): ImplicitA()
(2): ImplicitA()
)
(im): ModuleList(
(0): ImplicitM()
(1): ImplicitM()
(2): ImplicitM()
)
)
)
)
However, I would like to be able to take out features of any layer if possible, as I may need features of layers other than this one.
How can I do this?
I tried to do the extraction and visualization from the Model
class in models/yolo.py
with reference to https://github.com/ultralytics/yolov5/issues/3089, but could not figure out which code to edit and how.
I tried to do the same with the IDetect
class, but could not figure it out either.
Thanks to @DerekG for helping me figure this out!
The following is the code in yolov7/detect.py
after the resolution.
The -----
line indicates the omission of a code.
-------------------------------------------------------------
from utils.plots import plot_one_box, plot_ts_feature_maps # Add plot_ts_feature_maps method
-------------------------------------------------------------
def detect(save_img=False):
-------------------------------------------------------------
# Load model
model = attempt_load(weights, map_location=device) # load FP32 model
---------------------------------------------------------------------
# Set Dataloader
vid_path, vid_writer = None, None
if webcam:
view_img = check_imshow()
cudnn.benchmark = True # set True to speed up constant image size inference
dataset = LoadStreams(source, img_size=imgsz, stride=stride)
else:
dataset = LoadImages(source, img_size=imgsz, stride=stride)
--------------------------------------------------------------------------
for path, img, im0s, vid_cap in dataset:
img = torch.from_numpy(img).to(device)
img = img.half() if half else img.float() # uint8 to fp16/32
img /= 255.0 # 0 - 255 to 0.0 - 1.0
if img.ndimension() == 3:
img = img.unsqueeze(0)
------------------------------------------------------------------
# Start of postscript
def make_hook(key):
def hook(model, input, output):
intermediate_output[key] = output.detach()
return hook
layer_num = 104 # Intermediate layer number
intermediate_output = {}
model.model[layer_num].register_forward_hook(make_hook(layer_num))
# forward pass
model(img)
# print feature map shape
feature_maps = intermediate_output[layer_num]
print(feature_maps.shape)
# Outputs a feature map of the intermediate layer
plot_ts_feature_maps(feature_maps)
# End of postscript
t2 = time_synchronized()
------------------------------------------------------------------
Also, yolov7/utils/plots.py
was added as follows.
Torchshow is a module to visualize Tensor. Here is the official GitHub: https://github.com/xwying/torchshow
-------------------------------------------------------------------------
# Add module
import torchshow as ts
-------------------------------------------------------------------------
# Add plot_ts_feature_maps method at the bottom
def plot_ts_feature_maps(feature_maps):
import matplotlib
matplotlib.use('TkAgg')
feature_maps = feature_maps.to(torch.float32)
ts.show(feature_maps[0])
As a test, to extract 4 feature maps for the second layer, I changed layer_num = 1
in detect.py
and ts.show(feature_maps[0][:4])
in plots.py
and ran the following command.
python detect.py --weights yolov7.pt --source inference/images/horses.jpg --device 0 --no-trace
The inference results and feature maps were then output as follows. inference results feature map