computer-visionobject-detectionimage-recognitionimage-classificationimage-generation

Which of the following CNN models are used for which computer vision task?


Is my classification correct?

LeNet-5: Image classification,
AlexNet: Image classification,
VGG-16: Image classification,
ResNet: Image classification,
Inception module: Image classification,
MobileNet: Image classification,
EfficientNet: Image classification,
Neural Style Transfer: Image generation,
Sliding Windows Detection algorithm: Object detection,
R-CNN: Object detection,
YOLO: Object detection,
Siamese network: Image recognition,
U-Net: Semantic segmentation

If wrong, please correct me. THANKS!


Solution

  • Your classification is correct if the purpose is - why they were invented initially. However rather than the task based taxonomy, CNNs are better studied on the basis of what different they are doing. Initially CNNs were designed for image classification, but the same network works for Object detection with slight modifications in last layer. For example, Faster RCNN (designed for Object detection) can use any of the architecture designed for classification such as VGG, ResNet etc (link). Similarly Faster-RCNN can be modified to do segmentation task in Mask-RCNN architecture (link).

    Here is a chart showing evolutionary history of deep CNNs showing architectural innovations (source) enter image description here

    Here is another taxonomy showing different categories based on architecture style. enter image description here