I am training a computer vision model in AzureML and try setting the outputs directory to match the location of my logs/artifacts, i.e. my execution work dir 'exe/wd'. I code/prototype locally and upload jobs via mlclient. Training script, environment, computes etc. works. Writing files with:
with open(filepath, 'wb') as f:
f.write(obj)
stores writes files to the wrong location.
I would like to store/see/access my checkpoints here for convinience:
Job definition:
from azure.ai.ml import Output
job = command(
inputs={
'data':Input(
type="uri_folder",
path=f"{data_asset.id}",
mode=InputOutputModes.RO_MOUNT
)},
outputs={
'outputs':Output(
type='uri_folder',
path = './outputs',
mode =InputOutputModes.RW_MOUNT
)}
,
code="<path>",
command=az_cfg.COMMAND_TRAIN,
environment=f"{az_cfg.ENV_NAME}{az_cfg.ENV_VERSION}",
display_name=az_cfg.DISPLAY_NAME,
compute = ci_name,
evironment_variables={
"DATASET_MOUNT_BLOCK_BASED_CACHE_ENABLED": True
},
experiment_name=az_cfg.EXPERIMENT_NAME
)
What I expected to happen was to see my outputs folder in the "Outputs+Logs" tab in the studio.
After some research I found that outputs generated during execution will be stored in the default workspaceblobstorage. How can I change the outputs to be in the workspace working directory?
When saving a model with mlflow the log and save will land in the right place. However, I cannot use mlflow here. I am not sure if this is expected behaviour and just not possible or am I missing something out? Any help is highly appreciated.
According to this documentation, the local path like ./home/username/data/my_data
is supported only in Inputs and not for Outputs.
So, as you said, you need to log models and details in the default workspaceblobstorage
or your custom blob store and download them to your current workspace directory.
Below is the command job code.
job = command(
code="./src", # local path where the code is stored
command="python main.py --diabetes-csv ${{inputs.diabetes}} --model_out ${{outputs.output}}",
inputs={
"diabetes": Input(
type="uri_file",
path="https://azuremlexamples.blob.core.windows.net/datasets/diabetes.csv",
)
},
outputs={
'output': Output(
type=AssetTypes.CUSTOM_MODEL,
path = 'azureml://subscriptions/<subscriptions_id>/resourcegroups/<resourcegroup>/workspaces/jgsml/datastores/workspaceblobstore/paths/ML_output/',
mode = InputOutputModes.RW_MOUNT
)}
,
environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest",
display_name="sklearn-diabetes-example",
)
Code snippet to save the model in main.py
:
import joblib
from pathlib import Path
model = train_model(params, X_train, X_test, y_train, y_test)
joblib.dump(model, (Path(args.model_out) / "model_1.pkl"))
After the job is completed successfully, download it to the current directory using the code below.
returned_job = ml_client.create_or_update(job)
ml_client.jobs.download(returned_job.name, download_path="./job_output", all=True)
Output: