audiospeech-recognitionspeechmfcckaldi

Extract MFCC coefficient without the log?


I am currently trying to replicate the works of a paper, in which they train a cnn using MFCC features without the DCT performed at the end. It is basically the log of the energies of the filter banks.

I know that kaldi can compute the MFCC features using the make_mfcc.sh script. But can the script somehow be altered to compute the MFCC without the DCT performed at the end, if not is there other tools that might me able to do so?

MFCCs are commonly derived as follows:

Take the Fourier transform of (a windowed excerpt of) a signal.

  1. Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows.
  2. Take the logs of the powers at each of the mel frequencies.
  3. Take the discrete cosine transform of the list of mel log powers, as if it were a signal.
  4. The MFCCs are the amplitudes of the resulting spectrum.

Solution

  • You can use make_fbank script to extract log energies.