My model worked in notebooks, where I just pip installed the required packages. But I am trying to run on a pipeline, and when I submit the pipeline as a job I get stuck in the Preparing phase. I believe this is due to the environment not resolving correctly. I've attached my .yml files, and I'm wondering where the issue might be.
conda_depencies.yml
name: pytorch-env-with-optuna
channels:
- conda-forge
dependencies:
- python=3.8
- numpy
- pandas
- scikit-learn
- pip
- pip:
- torch
- optuna
- mltable
- azure-ai-ml
- azure-identity
component.yml
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name:
display_name:
version: 6.0.0
type: command
description: >
inputs:
outputs:
code: .
environment:
image: mcr.microsoft.com/azureml/curated/acpt-pytorch-1.11-py38-cuda11.3-gpu:9
conda_file: conda_dependencies.yml
command: >
I've also tried removing optuna from the dependencies list just so I can get past the preparing phase, and that didn't work either. I'm not sure what I'm doing wrong.
The issue you're facing is a common one when working with Azure Machine Learning's curated environments. Your job is getting stuck in the "Preparing" phase due to dependency conflicts between your conda_dependencies.yml
file and the packages that are already pre-installed in the Docker image.
The curated image you've selected, mcr.microsoft.com/azureml/curated/acpt-pytorch-1.11-py38-cuda11.3-gpu:9
, is designed to be a ready-to-use environment. It already includes:
Python 3.8
PyTorch 1.11 (built for CUDA 11.3)
NumPy
Pandas
Scikit-learn
...and many other common packages.
When you provide a conda_file
that lists these same packages again, you are asking the conda
package manager to resolve a complex and conflicting set of requirements. The resolver tries to find a version of every package that satisfies both the pre-built environment and your new requests, which often leads to an endless loop or a timeout, causing your job to appear "stuck."
To fix this, you should modify your Conda dependencies file to only include packages that are not already in the base image. Your goal is to add to the environment, not redefine it.
Here is a corrected version of your conda_dependencies.yml
that should resolve correctly:
YAML
name: pytorch-env-with-optuna
channels:
- conda-forge
dependencies:
- pip
- pip:
- optuna
- mltable
- azure-ai-ml
- azure-identity
By removing python
, torch
, numpy
, pandas
, and scikit-learn
from the file, you eliminate the conflicts. Azure ML will use the versions already present in the curated image and simply pip install
the additional libraries you need (optuna
and the specific Azure SDKs).
Before customizing a curated environment, it's always a good idea to check what's already included. You can find the full package list for all Azure ML curated environments in the official documentation:
This will help you create minimal dependency files that build quickly and reliably.