My program uses the Go bindings of the Vosk speech recognition library, which takes in the audio as byte slices of WAV mono audio. My program currently uses the external command arecord
to get WAV audio from the microphone but I'd prefer to do it in Go proper, and preferably without any shared library dependencies.
I tried using the malgo package but got stuck on how to convert the raw audio from the microphone to WAV. The WAV encoding packages I've found only write to files (io.WriteSeeker) but I need to convert a continuous stream from the microphone for realtime speech recognition.
Linux at least
I ended up using malgo
too, with malgo.FormatS16
.
That produces bytes in this callback:
// https://github.com/gen2brain/malgo/blob/master/_examples/capture/capture.go
onRecvFrames := func(pSample2, pSample []byte, framecount uint32) {
// Empirically, len(pSample) is 480, so for sample rate 44100 it's triggered about every 10ms.
// sampleCount := framecount * deviceConfig.Capture.Channels * sizeInBytes
pSampleData = append(pSampleData, pSample...)
}
Which I can convert to int
(used GPT-4 for this):
func twoByteDataToIntSlice(audioData []byte) []int {
intData := make([]int, len(audioData)/2)
for i := 0; i < len(audioData); i += 2 {
// Convert the pCapturedSamples byte slice to int16 slice for FormatS16 as we go
value := int(binary.LittleEndian.Uint16(audioData[i : i+2]))
intData[i/2] = value
}
return intData
}
and than use the "github.com/go-audio/wav"
to produce in-memory wav bytes (again GPT-4 created the in-memory FileSystem hack to overcome the io.WriteSeeker
requirement)
// Create an in-memory file to support io.WriteSeeker needed for NewEncoder which is needed for finalizing headers.
inMemoryFilename := "in-memory-output.wav"
inMemoryFile, err := fs.Create(inMemoryFilename)
dbg(err)
// We will call Close ourselves.
// Convert audio data to IntBuffer
inputBuffer := &audio.IntBuffer{Data: intData, Format: &audio.Format{SampleRate: iSampleRate, NumChannels: iNumChannels}}
// Create a new WAV wavEncoder
bitDepth := 16
audioFormat := 1
wavEncoder := wav.NewEncoder(inMemoryFile, iSampleRate, bitDepth, iNumChannels, audioFormat)
I have developed these snippets while trying to put sth like you want together - a streaming voice assistant [WIP] https://github.com/Petrzlen/vocode-golang