streamtwiliotext-to-speechspeech-to-texttwilio-twiml

Bidirectional Stream Connect not working with Gather


I am using phone number voice webhook, making a TwiML response like this:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Connect>
<Stream url="wss://..."/>
</Connect>
<Gather speechTimeout="auto" speechModel="phone_call" enhanced="true" input="speech" action="/respond"/>
</Response>

It is starting the Bidirectional Voice Stream properly, no issues there. It is able to connect, send data and disconnect. But its not making any request to '/respond' in Gather part. If I remove the Stream connect part and update TwiML to this:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Gather speechTimeout="auto" speechModel="phone_call" enhanced="true" input="speech" action="/respond"/>
</Response>

then Gather is being called. But why is it not being called with Bidirectional Stream?

Q. What do we want?

Either:

Q. Why using Gather?

Currently, we are using its already trained speechTimeout and speech model to get que on when user has stopped speaking. In Gather step, we are making request to another API endpoint, where with the help of 'StreamId', 'ConnectionId' and 'CallId' we are sending voice response as streaming output.


Solution

  • The behavior you described of the Stream working but not the Gather is by design of the Twiml you are using. Twilio processes the Twiml in order and doesn't proceed until the "verb" finishes. The verbs in the Twiml are Connect and Gather. You have the Gather twiml after the Stream:

    <?xml version="1.0" encoding="UTF-8"?>
    <Response>
        <Connect>
            <Stream url="wss://..."/>
        </Connect>
        <Gather speechTimeout="auto" speechModel="phone_call" enhanced="true" input="speech" action="/respond"/>
    </Response>
    

    An alternative, is to use just the Gather Twiml and then use the Twilio REST API to handle the Media Streams:

    string accountSid = Environment.GetEnvironmentVariable("TWILIO_ACCOUNT_SID");
    string authToken = Environment.GetEnvironmentVariable("TWILIO_AUTH_TOKEN");
    
    TwilioClient.Init(accountSid, authToken);
    
    var stream = StreamResource.Create(
        url: new Uri("wss://example.com/"),
        pathCallSid: "CAXXXXXXXXXXXXXXXXXXXXXXXXXXX"
    );
    

    Your application would need to obtain the pathCallSid from the call that was in "Gather" mode, then use the Twilio REST API to start the media stream for that call. One problem with this approach is that Gather seems best suited for short segments of the call.

    To address another question you asked:

    Do do it completely via streams?

    Here I am getting issues in getting StreamId, ConnectionId, CallId at one place.

    Check out the Status Callback parameter when you create the stream:

    The statusCallback attribute takes an absolute or relative URL as value. Whenever a stream is started or stopped, Twilio will make a request to this URL

    For example:

    <Stream url="wss://..." statusCallback="http://yourapi.com..." />
    

    The parameters sent to the statusCallback url contain the StreamSid & CallSid.