gowavmicrophoneaudio-capturevosk

How to get WAV audio from a microphone in Go


My program uses the Go bindings of the Vosk speech recognition library, which takes in the audio as byte slices of WAV mono audio. My program currently uses the external command arecord to get WAV audio from the microphone but I'd prefer to do it in Go proper, and preferably without any shared library dependencies.

I tried using the malgo package but got stuck on how to convert the raw audio from the microphone to WAV. The WAV encoding packages I've found only write to files (io.WriteSeeker) but I need to convert a continuous stream from the microphone for realtime speech recognition.

Linux at least


Solution

  • I ended up using malgo too, with malgo.FormatS16.

    That produces bytes in this callback:

        // https://github.com/gen2brain/malgo/blob/master/_examples/capture/capture.go
        onRecvFrames := func(pSample2, pSample []byte, framecount uint32) {
            // Empirically, len(pSample) is 480, so for sample rate 44100 it's triggered about every 10ms.
            // sampleCount := framecount * deviceConfig.Capture.Channels * sizeInBytes
            pSampleData = append(pSampleData, pSample...)
        }
    

    Which I can convert to int (used GPT-4 for this):

    func twoByteDataToIntSlice(audioData []byte) []int {
        intData := make([]int, len(audioData)/2)
        for i := 0; i < len(audioData); i += 2 {
            // Convert the pCapturedSamples byte slice to int16 slice for FormatS16 as we go
            value := int(binary.LittleEndian.Uint16(audioData[i : i+2]))
            intData[i/2] = value
        }
        return intData
    }
    

    and than use the "github.com/go-audio/wav" to produce in-memory wav bytes (again GPT-4 created the in-memory FileSystem hack to overcome the io.WriteSeeker requirement)

    // Create an in-memory file to support io.WriteSeeker needed for NewEncoder which is needed for finalizing headers.
        inMemoryFilename := "in-memory-output.wav"
        inMemoryFile, err := fs.Create(inMemoryFilename)
        dbg(err)
        // We will call Close ourselves.
    
        // Convert audio data to IntBuffer
        inputBuffer := &audio.IntBuffer{Data: intData, Format: &audio.Format{SampleRate: iSampleRate, NumChannels: iNumChannels}}
    
        // Create a new WAV wavEncoder
        bitDepth := 16
        audioFormat := 1
        wavEncoder := wav.NewEncoder(inMemoryFile, iSampleRate, bitDepth, iNumChannels, audioFormat)
    
    

    I have developed these snippets while trying to put sth like you want together - a streaming voice assistant [WIP] https://github.com/Petrzlen/vocode-golang