pythonimageaudiospectrogram

Audio to Spectrogram Image


I expect I can convert an audio file or waveform to the spectrogram image where:

  1. X-axis represent time (horizontal axis), where goes to the right meaning to the ending duration of audio.
  2. Y-axis represent frequency (vertical axis), where goes to the up meaning to the maximum of frequency from audio that. So the range is (20hz until the max possible audio frequency can reach). I also expect I can set scale in this axis such as linearly or logarithm or with my custom function like: f(p) = 2p where p is n-th pixel from 0 to the maximum heigh of image and f(p) is frequency.
  3. Black pixel represent no amplitude
  4. White pixel represent the max possible audio amplitudo can reach
  5. That's mean, gray pixel represent amplitude value that between of them
  6. I also expect I able to specify resolution of image such as 720*480

So is there python library/package that can I install, or I should calculate manually which I should transform from time domain waveform to the frequency domain waveform using Fast Fourier Transform?


Solution

  • Check librosa library, should conain all that you need. For instance https://librosa.org/doc/main/generated/librosa.stft.html