I am trying to submit a job to AzureML using Python SDK V2. I am developing locally on VSCode. As my repository grew over time, I now get the message when submitting the job:
Your file exceeds 100 MB. If you experience low speeds, latency, or broken connections, we recommend using the AzCopyv10 tool for this file transfer.
A similar question has been posted a few months, although with no responses.
I added an .amlignore file, however, this didn't do much.
I couldn't find anything on how to use azcopy to submit the job to azureml. My repository is in a devops repository, if that information helps on finding a workaround.
Any help is highly appreciated.
The warning arises only for the first time when you submit the job; you won't get it again until you make changes to your source code, as the data is already present from a previous job submission.
In that case, you can use azcopy
manually and provide the storage account associated with your machine learning workspace while submitting the job.
Here is the step:
Log in using azcopy
:
azcopy login
Or use the SAS URL as the destination path. Refer here for more information about authentication.
I have used a SAS URL with read, write, list, and create permissions.
Here is the command to copy:
azcopy copy '<source_code_path>/*' 'https://<storage_account>.blob.core.windows.net/<container_name>/code/<sas_token>'
Provide the storage account associated with your ML workspace and the SAS token. Use the same storage account path while submitting the job.
Job submission:
command_job = command(
code="https://jgsml0114685361.blob.core.windows.net/job1-data/code/",
command="echo hi from command",
environment="azureml:docker-context-example-10:1",
timeout=10800
)
This method does not upload the code but reads it directly from there.