google-colaboratoryspeech-recognitiontensorflow.jsspectrogramtfjs-node

How to get a spectrogram offline with the right shape as an input to recognize()?


I am trying to perform offline recognition with my own trained model according to this doc: https://github.com/tensorflow/tfjs-models/tree/master/speech-commands

I had the same issue as https://github.com/tensorflow/tfjs/issues/3820 described, and I had tried all solutions suggested from there, including the colab (preprocessing model)support https://colab.research.google.com/github/tensorflow/tfjs-models/blob/master/speech-commands/training/browser-fft/training_custom_audio_model_in_python.ipynb#scrollTo=1AjdTru5NnQQ which worked fine with its given wav files but got an array of NaN values when using my own wav files:

filepath = '/my/own/file.wav'
file_contents = tf.io.read_file(filepath)
wavform = tf.expand_dims(tf.squeeze(tf.audio.decode_wav(
      file_contents, 
      desired_channels=-1,
      desired_samples=TARGET_SAMPLE_RATE).audio, axis=-1), 0)
    cropped_waveform = tf.slice(waveform, begin=[0, 0], size=[1, EXPECTED_WAVEFORM_LEN])    
    spectrogram = tf.squeeze(preproc_model(cropped_waveform), axis=0)
print(spectrogram)


Output:

tf.Tensor(
[[[nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
   ...
   [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]
  [nan]]], shape=(43, 232, 1), dtype=float32)

Is there a way to solve this problem?

For instance, should I modify my wav files data according to the given wav files? But how? Did I miss some important steps during the preprocessing procedure while handling my own wav files? Or is there a simpler way to achieve this in javascript instead of in python?


Solution

  • Your problem is identical to the github issue https://github.com/tensorflow/tfjs/issues/3820.

    Can you check if your input tensor of preproc_model() contains a lot of zero entries? I think it's these zero entries that cause the "nan" problem.