pythonflaskgoogle-cloud-storagepydubaudio-converter

Convert an audio file from mp3 to flac by Flask, and save at Google Cloud Storage


I'm trying to create a Flask app that transcribes mp3 files using GCP's speech-to-text and saves the results to Cloud Storage (GCS). (Upload the mp3 file from Vue.js to Flask) In the process, I want to convert the mp3 file selected by the user to a flac file, but I do not want to save it locally and throw it to speech-to-text or save it to GCS.

Any module can be used, but I want to save the audio file converted to GCS without going through the local.

I tried to use pydub, which is often used to convert audio files, but the relative path was the only argument that could be taken when selecting the pre-conversion file and where to save the post-conversion file. I cannot convert mp3 files received by Flask to flac files and save them to GCS.

Even if you can't save to GCS, you can save the converted file to variables. I couldn't do that either.

from pydub import AudioSegment
# (1)I can convert an audio file at the local.
sound = AudioSegment.from_mp3("example.mp3")
sound.export("example.flac", format="flac")

# (2)I CANNOT pass GCS URL as an argument
sound = AudioSegment.from_mp3("https://storage.googleapis.com/<bucket-name>/example.mp3")
sound.export("https://storage.googleapis.com/<bucket-name>/example.flac", format="flac")

# (3)I CANNOT written to a variable
sound = sound.export(format="flac")
sound.export("example.flac")

Of the above source code

(1) Shows the expected behavior, and the converted example.flac is saved in the current directory

(2) FileNotFoundError: [Errno 2] No such file or directory: 'https://storage.googleapis.com//npl_speech_2.mp3'

(3) AttributeError: '_io.BufferedRandom' object has no attribute 'export'

In the end, I want to use AWS Lambda, so I want to convert files without going through local.


Solution

  • Your assumption is correct, you cannot pass a GCS URL as an argument. You'll first need to "download" the desired object and then proceed with the conversion. The download will be done to a temporary folder.

    You can achieve that using the GCS client library for Python. Your code might look like this:

    from google.cloud import storage
    
    storage_client = storage.Client()
    bucket = storage_client.get_bucket("<BUCKET_NAME>")
    blob = bucket.blob("<OBJECT_NAME>")
    blob.download_to_filename("/tmp/<TMP_OBJECT_NAME>")
    
    #Convert downloaded object and save the export to a tmp file
    sound = AudioSegment.from_mp3("/tmp/<TMP_OBJECT_NAME>")
    sound.export("/tmp/<TMP_OBJECT_NAME_CONVERTED>", format="flac")
    
    #Set name of the object that will be uploaded to GCS
    destination_object_name = "<storage-object-name>"
    
    #Set blob name and upload the exported file
    blob_to_upload = bucket.blob(destination_object_name)
    blob_to_upload.upload_from_filename(/tmp/<TMP_OBJECT_NAME_CONVERTED>)
    
    

    Here you'll find more examples on how to use the GCS client library.