I'm facing a very specific, reproducible bug and I've hit a wall after trying all the standard solutions. I would appreciate any insight.
I am developing a voice assistant setup flow where the app uses SpeechRecognizer (STT) to get the user's name, followed by TextToSpeech (TTS) to confirm the settings.
The entire flow gets stuck because the TextToSpeech engine hangs immediately after the SpeechRecognizer session concludes. This issue has been reliably reproduced on a specific test device.
Device: Samsung SM-A736B
OS: Android 15 (One UI 7)
STT: System android.speech.SpeechRecognizer
TTS: Google TTS Engine (com.google.android.tts), to which I am explicitly binding.
Here is the exact flow leading to the failure, confirmed by logcat analysis:
A SpeechRecognizer instance is created in a foreground service (VoiceSessionService) and starts listening.
The user speaks, and the onResults callback is successfully triggered with the transcribed text.
Immediately after receiving the result, the SpeechRecognizer instance is fully terminated via stopListening() and destroy(), and the service stops itself.
The main thread waits for a 500ms delay using Handler.postDelayed.
After the delay, it successfully requests and is granted audio focus (AUDIOFOCUS_REQUEST_GRANTED) using AudioFocusRequest.Builder.
A call is made to TextToSpeech.speak() to utter the first confirmation phrase.
FAILURE POINT: The TextToSpeech engine accepts the command (the call to .speak() returns without error) but then goes silent. It never plays the audio and, most critically, it never triggers the onDone or onError callbacks in its UtteranceProgressListener.
The application logic is now permanently blocked, waiting for an onDone callback that will never arrive.
ConversationManager.kt - Logic for handling the STT result:
// This method is called after the STT result is received
fun onUserInput(text: String) {
if (awaitingFirstRunName) {
awaitingFirstRunName = false
val nameToSave = if (text.isNotBlank()) text.trim() else "Default Name"
settings.saveUserName(nameToSave)
Handler(Looper.getMainLooper()).postDelayed({
val audioManager = appContext.getSystemService(Context.AUDIO_SERVICE) as AudioManager
val audioAttributes = AudioAttributes.Builder()
.setUsage(AudioAttributes.USAGE_ASSISTANCE_ACCESSIBILITY)
.setContentType(AudioAttributes.CONTENT_TYPE_SPEECH)
.build()
val focusRequest = AudioFocusRequest.Builder(AudioManager.AUDIOFOCUS_GAIN_TRANSIENT_MAY_DUCK)
.setAudioAttributes(audioAttributes)
.build()
val result = audioManager.requestAudioFocus(focusRequest)
if (result == AudioManager.AUDIOFOCUS_REQUEST_GRANTED) {
Log.d("MY_APP_LOG", "Audio focus granted. Forcing TTS engine restart.")
// Forcing a restart of the TTS Engine
TtsManager.shutdown()
TtsManager.initialize(appContext)
val confirmationPhrase = "OK, $nameToSave"
TtsManager.speak(
context = appContext,
text = confirmationPhrase,
queueMode = TextToSpeech.QUEUE_ADD,
onDone = {
// This callback is never triggered
Log.d("MY_APP_LOG", "Confirmation phrase DONE. This is never logged.")
speakFinalSettings(appContext)
}
)
} else {
Log.e("MY_APP_LOG", "Could not get audio focus.")
}
}, 500)
return
}
// ... logic for other commands follows
}
VoiceSessionService.kt - Ensuring SpeechRecognizer is destroyed:
private fun processCommand(text: String) {
val intent = Intent(CommandReceiver.ACTION_PROCESS_COMMAND).apply {
putExtra(CommandReceiver.EXTRA_RECO_TEXT, text)
}
sendBroadcast(intent)
// Immediately destroy the recognizer to release all audio resources
speechRecognizer?.stopListening()
speechRecognizer?.destroy()
speechRecognizer = null
stopSelf()
}
Logcat Snippet:
D/RescueService: onResults: Andrei
D/RescueService: First run: User name saved as 'Andrei'
// --- 500ms delay occurs here ---
D/RescueService: Audio focus granted for the first time setup confirmation.
D/RescueService: Forcing TTS engine restart.
D/RescueService: TTS speaking: 'OK, Andrei' (queue=1) id=...
// --- SILENCE ---
// --- No "Confirmation phrase DONE" log ever appears ---
Here is the source code for the helper classes used in the snippets above.
TtsManager.kt
package com.babenko.rescueservice.voice
import android.content.Context
import android.os.Handler
import android.os.Looper
import android.speech.tts.TextToSpeech
import android.speech.tts.UtteranceProgressListener
import com.babenko.rescueservice.core.Logger
import java.util.*
object TtsManager : TextToSpeech.OnInitListener {
private var tts: TextToSpeech? = null
private var isInitialized = false
private var appContext: Context? = null
private val handler = Handler(Looper.getMainLooper())
private val pending = ArrayDeque<Triple<String, Int, (() -> Unit)?>>()
private val utteranceCallbacks = mutableMapOf<String, () -> Unit>()
fun initialize(context: Context) {
if (tts != null && isInitialized) return
appContext = context.applicationContext
try {
// Binding explicitly to Google TTS Engine
tts = TextToSpeech(context.applicationContext, this, "com.google.android.tts")
} catch (e: Exception) {
Logger.e(e, "Failed to create TextToSpeech")
}
}
override fun onInit(status: Int) {
if (status != TextToSpeech.SUCCESS) {
isInitialized = false
pending.clear()
return
}
isInitialized = true
tts?.setOnUtteranceProgressListener(object : UtteranceProgressListener() {
override fun onStart(utteranceId: String?) {}
override fun onDone(utteranceId: String?) {
utteranceId?.let {
handler.post {
utteranceCallbacks[it]?.invoke()
utteranceCallbacks.remove(it)
}
}
}
@Deprecated("Deprecated in Java")
override fun onError(utteranceId: String?) {
utteranceId?.let { utteranceCallbacks.remove(it) }
}
})
// Process any pending speech requests
while (pending.isNotEmpty()) {
val (text, mode, onDone) = pending.removeFirst()
internalSpeak(text, mode, onDone)
}
}
fun speak(context: Context, text: String, queueMode: Int = TextToSpeech.QUEUE_ADD, onDone: (() -> Unit)? = null) {
if (!isInitialized) {
pending.addLast(Triple(text, queueMode, onDone))
initialize(context)
return
}
internalSpeak(text, mode, onDone)
}
private fun internalSpeak(text: String, queueMode: Int, onDone: (() -> Unit)? = null) {
val utteranceId = text.hashCode().toString() + System.currentTimeMillis()
onDone?.let { utteranceCallbacks[utteranceId] = it }
tts?.speak(text, queueMode, null, utteranceId)
}
fun shutdown() {
tts?.stop()
tts?.shutdown()
tts = null
isInitialized = false
pending.clear()
utteranceCallbacks.clear()
}
}
ConversationManager.kt
package com.babenko.rescueservice.voice
import android.content.Context
import android.media.AudioAttributes
import android.media.AudioFocusRequest
import android.media.AudioManager
import android.os.Handler
import android.os.Looper
import android.speech.tts.TextToSpeech
import android.util.Log
import com.babenko.rescueservice.data.SettingsManager // Assuming this is your settings helper
object ConversationManager {
private lateinit var appContext: Context
private var awaitingFirstRunName = false
private val settings: SettingsManager by lazy { SettingsManager.getInstance(appContext) }
fun init(context: Context) {
appContext = context.applicationContext
}
fun startFirstRunSetup() {
// ... (Code to speak the welcome message and start STT)
}
fun onUserInput(text: String) {
if (awaitingFirstRunName) {
awaitingFirstRunName = false
val nameToSave = if (text.isNotBlank()) text.trim() else "Default Name"
settings.saveUserName(nameToSave)
Handler(Looper.getMainLooper()).postDelayed({
val audioManager = appContext.getSystemService(Context.AUDIO_SERVICE) as AudioManager
val audioAttributes = AudioAttributes.Builder()
.setUsage(AudioAttributes.USAGE_ASSISTANCE_ACCESSIBILITY)
.setContentType(AudioAttributes.CONTENT_TYPE_SPEECH)
.build()
val focusRequest = AudioFocusRequest.Builder(AudioManager.AUDIOFOCUS_GAIN_TRANSIENT_MAY_DUCK)
.setAudioAttributes(audioAttributes)
.build()
val result = audioManager.requestAudioFocus(focusRequest)
if (result == AudioManager.AUDIOFOCUS_REQUEST_GRANTED) {
Log.d("MY_APP_LOG", "Audio focus granted. Forcing TTS engine restart.")
// Forcing a restart of the TTS Engine
TtsManager.shutdown()
TtsManager.initialize(appContext)
val confirmationPhrase = "OK, $nameToSave"
TtsManager.speak(
context = appContext,
text = confirmationPhrase,
queueMode = TextToSpeech.QUEUE_ADD,
onDone = {
Log.d("MY_APP_LOG", "Confirmation phrase DONE. This is never logged.")
speakFinalSettings(appContext)
}
)
} else {
Log.e("MY_APP_LOG", "Could not get audio focus.")
}
}, 500)
return
}
}
fun speakFinalSettings(context: Context) {
val finalName = settings.getUserName()
// ... (builds final string and calls TtsManager.speak)
val confirmationMessage = "Settings confirmed for $finalName."
TtsManager.speak(context, confirmationMessage, TextToSpeech.QUEUE_ADD)
}
}
I have already implemented a series of fixes based on best practices to resolve audio resource conflicts between STT and TTS.
Explicitly bound to the Google TTS Engine (com.google.android.tts).
Ensured immediate and full destruction of SpeechRecognizer using .destroy() right after getting a result.
Introduced a postDelayed barrier of 500ms between destroying SpeechRecognizer and attempting to use TextToSpeech.
Upgraded the audio focus request to the modern AudioFocusRequest.Builder method, specifying AudioAttributes for an accessibility assistant.
AUDIOFOCUS_REQUEST_GRANTED), but the TTS engine still hangs.Forced a full restart of the TTS engine by calling TtsManager.shutdown() and then TtsManager.initialize() immediately before speaking.
I expected that by correctly managing the SpeechRecognizer lifecycle, adding a delay, properly requesting and receiving audio focus, and even force-restarting the TTS engine, the TextToSpeech engine would function normally and play the audio.
Despite all these measures and successfully acquiring audio focus, the TextToSpeech engine still enters a non-functional state. It accepts the .speak() command but never executes it or provides a completion callback (onDone).
This leads to my final question:
Why does the TextToSpeech engine hang in this manner after a SpeechRecognizer session, even when all documented best practices are followed and audio focus is successfully granted?
Fixed. The issue was that an implicit broadcast from a foreground service in a separate process was blocked on Android 14/15. We made the broadcast explicit and sent it immediately before stopping the service, restoring reliable delivery and the final voice confirmation.
Additionally, the project already includes proper delay, audio-focus handling, and SR → TTS shutdown order, so the full voice flow is now stable.