I have a signal 3M samples long. I want to subsample it. I know that to avoid aliasing I need to filter off frequencies higher than Nyquist frequency. I know that I can do that by convolution with certain filters (e.g. Butterworth), but I know that way some of the high frequencies is preserved.
I wonder whether I can just zero the unwanted frequencies in the frequency domain and use inverse FFT to go back to time domain. Is such approach numerically correct? I know that certain libraries speed-up convolution with use of FFT.
It's not really correct. Zeroing out the top frequencies in the FFT will only zero out frequencies with wavelengths that divide the FFT length. If you were to frequency-shift your signal by half a bin and do another FFT, you'd find that there is some leakage and the upper frequencies are not all zero.
The result will be pretty close, but taking an FFT of the whole signal is a very expensive way to get just pretty close.
You should just use a normal filter. As long as you leave a reasonable amount of room between the filter cut-off frequency and the Nyquist frequency, it's easy to ensure that any aliasing error will be much smaller than quantization error and other noise.