Trying to train a neural network to deal with audio data, I would like to assess some of its inner representations. One of them is very much a magnitude spectrogram without phase information, but with high overlap between Hann windows.
Is there a way I can use tf.contrib.signal.inverse_stft
to generate an audio signal from this magnitude-only spectrogram? If there is not, is there some other straightforward way (eg. something effecting to a sum of band pass filters on white noise) to do this?
I don't know much about tf's inverse_stft
; it seems to require a complimentary window function in order to work.
But to estimate the original waveform from its STFT without phase information, you might want to look at either the Griffin-Lim algorithm, or WaveNet vocoder conditioned on Mel spectrogram (which can be derived from linear spectrogram from STFT).
Griffin-Lim alg: https://github.com/bkvogel/griffin_lim
WaveNet vocoder: https://github.com/r9y9/wavenet_vocoder