google-cloud-platformtext-to-speechgoogle-cloud-speechgoogle-text-to-speech

Make a website converting text to audio [Google Cloud Text to Speech API]


I'm a beginner in coding. I would like to make a simple website using Google Cloud Text to Speech API.

  1. a web site with a text box
  2. you write a text in the text box and click a button "convert to audio"
  3. you can download mp3 file which is made by google cloud text to speech api

I have read Google Cloud Text to Speech API's official site, but couldn't find a solution.

I have searched like "develop a website converting text to audio". I found this site. Creating an HTML Application to Convert Text Files to Audio Files However, it didn't meet my request.

Could you give me any information to develop a website converting text to audio?

Thank you in advance.

Sincerely, Kazu

I have made a python program on Google Colaboratory. I would like to do the same thing on a website.

from google.colab import drive
drive.mount('/content/drive')

!cp ./drive/'My Drive'/credential.json ./credential.json
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="credential.json"
f= open("text.ssml","w+")
f.write('<speak><prosody rate="slow">hello world</prosody></speak>')
f.close()
!pip install google-cloud-texttospeech
#!/usr/bin/env python
from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()
with open('text.ssml', 'r') as f:
    ssml = f.read()
    input_text = texttospeech.types.SynthesisInput(ssml=ssml)
voice = texttospeech.types.VoiceSelectionParams(language_code='en-US', name="en-US-Wavenet-A")

audio_config = texttospeech.types.AudioConfig(audio_encoding=texttospeech.enums.AudioEncoding.MP3)
response = client.synthesize_speech(input_text, voice, audio_config)
with open('output.mp3', 'wb') as out:
    out.write(response.audio_content)
    print('Audio content written to file "output.mp3"')
from google.colab import files
files.download('output.mp3')

Solution

  • In order to achieve what you want, as you say you are new to coding the first thing is to research the GCP text-to-speech API. A good first step is to follow the quick start tutorial available Using client libraries text-to-speech.

    As for your requirements of an input box to convert the text to audio. You need to follow the general guidelines for deploying an application on GCP. Serve Machine Learning Model on App Engine Flexible Environment

    so basically your steps would be to train a model and serve via an App engine deployment, or deploying an application which send requests with a json payload to the text-to-speech API. But you need to do quite a bit of reading. Hope this helps.