I am using the following command for creating a dataset in my Azure Speech service:
spx csr dataset create --api-version v3.1 --kind "Acoustic" --name "My Custom Speech" --description "My Acoustic Dataset Description" --project $project_id --content https://xyz.blob.core.windows.net/test-and-train-data --language "en-US"
The content flag is pointing to a specific container in my storage account where the data is stored. I tried this:
test-and-train-data
├── train.wav
└── trans.txt
and
test-and-train-data
└── wav_n_txt.zip
and:
test-and-train-data
└── en-US
├── train.wav
└── trans.txt
and:
test-and-train-data
└── en-US
└── wav_n_txt.zip
Because when running the spx dataset crate
command I see "locale": "en-US"
.
The command creates the dataset successfully but when inspecting it in the service I see an error, but no details and I cannot find a single example online for this. I have read everything under the custom speech overview. The download of the uploading process report is not working either. What am I doing wrong?
Azure's documentation is poor and uncomplete on this topic. Just leaving a link here to something that actually works. I adapted this for my own needs.