I am reading through http://www.scholarpedia.org/article/SIFT, and I need some help in a definition in this segment:
The SIFT descriptor has also been extended from grey-level to colour images and from 2-D spatial images to 2+1-D spatio-temporal video.
What is a 2+1-D spatial-temporal video?
It is simply a video.
They mean that the original technique was applied on grayscale images, which have 2 spatial dimensions; x and y.
It then was extended to be applied to colour images and then to temporal series of images, that is, videos. Videos have 3 dimensions; 2 spatial (x,y) and 1 temporal (time). They use 2+1 rather than 3 because 3D image usually refers to x/y/z rather than x/y/t