pythonpython-3.xgoogle-assistant-sdkgoogle-assist-api

Python: Can you directly pass an audio file as user input command to Google Assistant SDK?


Google Assistant SDK: My user input is always constant, meaning same user command all the time, so instead of asking the user to everytime record a voice commandvia a device microphone, I want to have the user press a button and execute the command by passing a pre-recorded audio file as input. Is that possible with the Google assistant SDK? Preferably in Python as I want to built an API endpoint around it

Any links, blogs, tutorials, samples etc would be very helful


Solution

  • With the Google Assistant SDK, it accepts and text or audio data as an input.

    It's currently something that can be shown in the pushtotalk sample.

    Here's a few code snippets showing how it is done in the sample:

    audio_source = audio_helpers.WaveSource(
            open(input_audio_file, 'rb'),
            sample_rate=audio_sample_rate,
            sample_width=audio_sample_width
    # ...
    # Create conversation stream with the 
    # given audio source and sink.
    conversation_stream = audio_helpers.ConversationStream(
        source=audio_source,
        sink=audio_sink,
        iter_size=audio_iter_size,
        sample_width=audio_sample_width,
    )
    # ...
    with SampleAssistant(lang, device_model_id, device_id,
                         conversation_stream,
                         grpc_channel, grpc_deadline,
                         device_handler) as assistant:
        # If file arguments are supplied:
        # exit after the first turn of the conversation.
        if input_audio_file or output_audio_file:
            assistant.assist()
            return