I am newbie in deep learning and doing my Final Year Project in Deep learning. I know that we use Conv2D in image related task but my professor asked me that why don't we use Conv1D or Conv3D? Why do we specifically use Conv2D here. I've searched whole internet to get proper answer to this question but i don't seem to find any solid answer to it. Please help me in this question because i am very confused and don't seem to find any proper answer.
Thank you!
In a 1 dimensional CNN, the kernel moves in 1 direction. Input and output data of a 1 dimensional CNN is 2 dimensional. It is mostly used on Time-Series data since you can just move left or right (x).
In a 2 dimensional CNN, the kernel moves in 2 directions. Input and output data of a 2 dimensional CNN is 3 dimensional. As you have mentioned it is widely used for instance in image related tasks since apart from left and right you can move up and down (x,y).
In a 3 dimensional CNN, the kernel moves in 2 directions. Input and output data of a 3 dimensional CNN is 4 dimensional. Since the kernel slides in 3 dimensions you have (x,y,z) possible movements. One example use case is medical imaging since they are 3 dimensional images taken by slices and then recostructed. All the slices added together must be analysed as a whole, so it has no sense taking single images and apply a 2 dimensional convolution since relationships are getting lost, you need to stack all the images to have a "3d" representation and analyse it with 3 dimensional convolutions.