python amazon-web-services flask text-to-speech amazon-polly

website downloading Amazon Polly's voice as mp3 file

I'm a beginner in coding. I would like to make a simple website using Amazon Polly.

a web site with a text box
you write a text in the text box and click a button "Read"
you can download a mp3 file which is made by Amazon Polly

I'm an English teacher in Japan, so I would like my students to use this website to improve their English pronunciation.

I have realized 1 and 2, but not 3. I can't find the way to download AudioStream as a mp3 file. I use flask, Amazon Polly and AWS Elastic Beanstalk.

This is application.py

from argparse import ArgumentParser
from flask import Flask, jsonify, Response, render_template, request, send_file
import os
import sys

from boto3 import Session
from botocore.exceptions import BotoCoreError, ClientError

# Mapping the output format used in the client to the content type for the
# response
AUDIO_FORMATS = {"ogg_vorbis": "audio/ogg",
                 "mp3": "audio/mpeg",
                 "pcm": "audio/wave; codecs=1"}

# Create a client using the credentials and region defined in the adminuser
# section of the AWS credentials and configuration files
session = Session(aws_access_key_id='???', aws_secret_access_key='???', region_name='us-east-1')
polly = session.client("polly")

# Create a flask app
application = Flask(__name__)


# Simple exception class
class InvalidUsage(Exception):
    status_code = 400

    def __init__(self, message, status_code=None, payload=None):
        Exception.__init__(self)
        self.message = message
        if status_code is not None:
            self.status_code = status_code
        self.payload = payload

    def to_dict(self):
        rv = dict(self.payload or ())
        rv['message'] = self.message
        return rv


# Register error handler
@application.errorhandler(InvalidUsage)
def handle_invalid_usage(error):
    response = jsonify(error.to_dict())
    response.status_code = error.status_code
    return response


@application.route('/', methods=['GET'])
def index():
    return render_template('index.html')


@application.route('/read', methods=['GET'])
def read():
    """Handles routing for reading text (speech synthesis)"""
    # Get the parameters from the query string
    try:
        outputFormat = request.args.get('outputFormat')
        text = request.args.get('text')
        voiceId = request.args.get('voiceId')
    except TypeError:
        raise InvalidUsage("Wrong parameters", status_code=400)

    # Validate the parameters, set error flag in case of unexpected
    # values
    if len(text) == 0 or len(voiceId) == 0 or \
            outputFormat not in AUDIO_FORMATS:
        raise InvalidUsage("Wrong parameters", status_code=400)
    else:
        try:
            # Request speech synthesis
            response = polly.synthesize_speech(Text='<speak><amazon:domain name="conversational"><prosody rate="slow">' + text + '</prosody></amazon:domain></speak>',
                                               VoiceId=voiceId, Engine='neural', TextType='ssml',
                                               OutputFormat=outputFormat)
        except (BotoCoreError, ClientError) as err:
            # The service returned an error
            raise InvalidUsage(str(err), status_code=500)

        return send_file(response.get("AudioStream"),
                         AUDIO_FORMATS[outputFormat])


# Define and parse the command line arguments
cli = ArgumentParser(description='Example Flask Application')
cli.add_argument(
    "-p", "--port", type=int, metavar="PORT", dest="port", default=8000)
cli.add_argument(
    "--host", type=str, metavar="HOST", dest="host", default="localhost")
arguments = cli.parse_args()


# If the module is invoked directly, initialize the application
if __name__ == '__main__':
    # Configure and run flask app
    application.secret_key = os.urandom(24)
    application.debug = True
    application.run(arguments.host, arguments.port)

This is index.html

<html>

<head>
    <title>Text-to-Speech Example Application</title>
    <script>
        /*
         * This sample code requires a web browser with support for both the
         * HTML5 and ECMAScript 5 standards; the following is a non-comprehensive
         * list of compliant browsers and their minimum version:
         *
         * - Chrome 23.0+
         * - Firefox 21.0+
         * - Internet Explorer 9.0+
         * - Edge 12.0+
         * - Opera 15.0+
         * - Safari 6.1+
         * - Android (stock web browser) 4.4+
         * - Chrome for Android 51.0+
         * - Firefox for Android 48.0+
         * - Opera Mobile 37.0+
         * - iOS (Safari Mobile and Chrome) 3.2+
         * - Internet Explorer Mobile 10.0+
         * - Blackberry Browser 10.0+
         */

        // Mapping of the OutputFormat parameter of the SynthesizeSpeech API
        // and the audio format strings understood by the browser
        var AUDIO_FORMATS = {
            'ogg_vorbis': 'audio/ogg',
            'mp3': 'audio/mpeg',
            'pcm': 'audio/wave; codecs=1'
        };

        /**
         * Handles fetching JSON over HTTP
         */
        function fetchJSON(method, url, onSuccess, onError) {
            var request = new XMLHttpRequest();
            request.open(method, url, true);
            request.onload = function () {
                // If loading is complete
                if (request.readyState === 4) {
                    // if the request was successful
                    if (request.status === 200) {
                        var data;

                        // Parse the JSON in the response
                        try {
                            data = JSON.parse(request.responseText);
                        } catch (error) {
                            onError(request.status, error.toString());
                        }

                        onSuccess(data);
                    } else {
                        onError(request.status, request.responseText)
                    }
                }
            };

            request.send();
        }

        /**
         * Returns a list of audio formats supported by the browser
         */
        function getSupportedAudioFormats(player) {
            return Object.keys(AUDIO_FORMATS)
                .filter(function (format) {
                    var supported = player.canPlayType(AUDIO_FORMATS[format]);
                    return supported === 'probably' || supported === 'maybe';
                });
        }

        // Initialize the application when the DOM is loaded and ready to be
        // manipulated
        document.addEventListener("DOMContentLoaded", function () {
            var input = document.getElementById('input'),
                voiceMenu = document.getElementById('voice'),
                text = document.getElementById('text'),
                player = document.getElementById('player'),
                submit = document.getElementById('submit'),
                supportedFormats = getSupportedAudioFormats(player);

            // Display a message and don't allow submitting the form if the
            // browser doesn't support any of the available audio formats
            if (supportedFormats.length === 0) {
                submit.disabled = true;
                alert('The web browser in use does not support any of the' +
                      ' available audio formats. Please try with a different' +
                      ' one.');
            }

            // Play the audio stream when the form is submitted successfully
            input.addEventListener('submit', function (event) {
                // Validate the fields in the form, display a message if
                // unexpected values are encountered
                if (text.value.length === 0) {
                    alert('Please fill in all the fields.');
                } else {
                    var selectedVoice = voiceMenu
                                            .options[voiceMenu.selectedIndex]
                                            .value;

                    // Point the player to the streaming server
                    player.src = '/read?voiceId=' +
                        encodeURIComponent(selectedVoice) +
                        '&text=' + encodeURIComponent(text.value) +
                        '&outputFormat=' + supportedFormats[0];
                    player.play();
                }

                // Stop the form from submitting,
                // Submitting the form is allowed only if the browser doesn't
                // support Javascript to ensure functionality in such a case
                event.preventDefault();
            });
        });

    </script>
    <style>
        #input {
            min-width: 100px;
            max-width: 600px;
            margin: 0 auto;
            padding: 50px;
        }

        #input div {
            margin-bottom: 20px;
        }

        #text {
            width: 100%;
            height: 200px;
            display: block;
        }

        #submit {
            width: 100%;
        }
    </style>
</head>

<body>
    <form id="input" method="GET" action="/read">
        <div>
            <label for="voice">Select a voice:</label>
            <select id="voice" name="voiceId">
                <option value="Joanna">Joanna</option>
                <option value="Matthew">Matthew</option>
            </select>
        </div>
        <div>
            <label for="text">Text to read:</label>
            <textarea id="text" maxlength="1000" minlength="1" name="text"
                    placeholder="Type some text here..."></textarea>
        </div>
        <input type="submit" value="Read" id="submit" />
    </form>
    <audio id="player"></audio>
</body>

</html>

I think I should add a javascript code like this. However, I don't know what to put in "var content" and I don't know where to put this javascript code in index.html.

<body>
    <script type='text/javascript'>
        function handleDownload() {
            var content = ??????;
            var blob = new Blob([ content ], { "type" : "audio/mp3" });                
            document.getElementById("download").href = window.URL.createObjectURL(blob);  
        }
    </script>
    <a id="download" href="#" download="test.mp3" onclick="handleDownload()">downloadMP3</a>
</body>

Could you give me any information or suggestion?

Thank you in advance.

Sincerely, Kazu

Solution

In Python, you can call start_speech_synthesis_task():

This operation requires all the standard information needed for speech synthesis, plus the name of an Amazon S3 bucket for the service to store the output of the synthesis task and two optional parameters (OutputS3KeyPrefix and SnsTopicArn). Once the synthesis task is created, this operation will return a SpeechSynthesisTask object, which will include an identifier of this task as well as the current status.

Therefore, you can request Amazon Polly to store the output of the speech to an mp3 file in Amazon S3. The application can then provide a URL to the mp3 file. This could either be a public file with a random name, or the application could generate a pre-signed URL that grants temporary access to the mp3 file.