I'm messing around in my app with a custom model for speech commands - I have it working fine recording and processing input audio from an AudioRecord, and I give feedback to the user through text to speech.
One issue I have is that I'd like this to work even when audio is playing - either through my own text to speech or through something else playing in the background (music for instance). I realize this is going to be a non trivial problem, but if I could get access in some way to the audio output data (what the phone is playing) and match that up with my microphone input data, I think I can at least adjust my model for this + improve my results.
However, based on Android - Can I get the audio data for playback from the audio mixer? , it sounds like that is impossible.
Two questions:
1) Is there any way that I'm missing to get access to expected audio output/playback data through the android api, or any options the android api provides for dealing with this issue (the feedback loop between audio output and input)?
2) Outside of stopping all other playback or waiting for other playback to finish - is there any other approach to solve this problem? I would assume some calling apps have a way of dealing with this if the user is on speaker phone, I'm just missing how to do it myself
Thanks
Answers to 1 & 2: You want AcousticEchoCanceler.
A short lecture on why "deleting the speaker audio from the microphone input" input is a non-trivial task that takes substantial signal processing knowledge: It's more complicated than just time-shifting the speaker audio a little bit and subtracting it from the mic input. The fact is, the spectrum of the audio changes drastically even as it leaves the speaker (most tiny speakers have a very peaky response centered around 3-4KHz). The audio may bounce off multiple objects (walls, etc.) before it gets back to the mic (multipath interference). Different frequency components interfere at the microphone in different, impossible to predict ways, vastly changing the spectrum of the audio. And by the way -- if anything in the room moves, say, if you put your hand near the phone -- everything changes. That is why you don't want to try to write your own echo cancellation filter. Android has provided one for you, so you can write cool speakerphone apps and such.