[SOLVED] How to create a dataset for Azure custom speech using spx (speechCLI)

How to create a dataset for Azure custom speech using spx (speechCLI)

I am using the following command for creating a dataset in my Azure Speech service:

spx csr dataset create --api-version v3.1 --kind "Acoustic" --name "My Custom Speech" --description "My Acoustic Dataset Description" --project $project_id --content https://xyz.blob.core.windows.net/test-and-train-data --language "en-US"

The content flag is pointing to a specific container in my storage account where the data is stored. I tried this:

test-and-train-data
├── train.wav
└── trans.txt

and

test-and-train-data
└── wav_n_txt.zip

and:

test-and-train-data
└── en-US
    ├── train.wav
    └── trans.txt

and:

test-and-train-data
└── en-US
    └── wav_n_txt.zip

Because when running the spx dataset crate command I see "locale": "en-US".

The command creates the dataset successfully but when inspecting it in the service I see an error, but no details and I cannot find a single example online for this. I have read everything under the custom speech overview. The download of the uploading process report is not working either. What am I doing wrong?

Solution

Azure's documentation is poor and uncomplete on this topic. Just leaving a link here to something that actually works. I adapted this for my own needs.