tensorflownormalizationtensorflow-model-garden

Why is 32768 used as a constant to normalize the wav data in VGGish?


I'm trying to follow along with what the code is doing for VGGish and I came across a piece that I don't really understand. In vggish_input.py there is this:

def wavfile_to_examples(wav_file):
  """Convenience wrapper around waveform_to_examples() for a common WAV format.
  Args:
    wav_file: String path to a file, or a file-like object. The file
    is assumed to contain WAV audio data with signed 16-bit PCM samples.
  Returns:
    See waveform_to_examples.
  """
  wav_data, sr = wav_read(wav_file)
  assert wav_data.dtype == np.int16, 'Bad sample type: %r' % wav_data.dtype
  samples = wav_data / 32768.0  # Convert to [-1.0, +1.0]
  return waveform_to_examples(samples, sr)

Where does the constant of 32768 come from and how does dividing that convert the data to samples?

I found this for converting to -1 and +1 and not sure how to bridge that with 32768.

https://stats.stackexchange.com/questions/178626/how-to-normalize-data-between-1-and-1


Solution

  • 32768 is 2^15. int16 has a range of -32768 to +32767. If you have int16 as input and divide it by 2^15, you get a number between -1 and +1.