I am trying to create Sagemaker pipeline with multiple steps. I have some code which I would like to share across different steps. Next example is not exact but simplified version for illustration.
I have folder structure which looks next:
source_scripts/
├── utils
│ ├── logger.py
├── models/
│ ├── ground_truth.py
│ ├── document.py
├── processing/
│ ├── processing.py
│ └── main.py
└── training/
├── training.py
└── main.py
I would like to use code from models
and utils
inside training.py
as I don't know where exactly code mounted on Sagemaker instance I am using:
from ..common.ground_truth import GroundTruthRow
As I am building pipeline I create processing and training step:
script_processor = FrameworkProcessor()
args = script_processor.get_run_args(
source_dir="source_scripts"
code="processing/main.py"
)
step_process = ProcessingStep(
code=args.code
)
estimator = Estimator(
source_dir="source_scripts"
code="training/main.py"
)
step_train = TrainingStep(
estimator=estimator
)
But during pipeline execution it results in error:
ImportError: attempted relative import with no known parent package
Any suggestions how to share code across several SageMaker jobs in single pipeline without building custom docker image?
Quickest solution for this issue is to move the entry point scripts (processing/main.py, training/main.py) to the upper level directly under "source_scripts/" like this:
source_scripts/
├── utils
│ └── logger.py
├── models/
│ ├── ground_truth.py
│ └── document.py
├── processing_main.py
└── training_main.py
and avoid using relative import with ..
in the entry point scripts.
from models.ground_truth import GroundTruthRow
The reason why you see the ImportError is because the entry point script's __name__
variable doesn't have package structure information unlike other modules.
If you print __name__
in models.ground_truth.py
, you would see something like models.ground_truth
and it contains package structure. But if you print __name__
in training/main.py
, you see __main__
, so Python cannot understand how to handle ..
.
If you need to keep the current directory structure, there may be more complicated solutions, but moving entry point files to upper level would be simplest.
Also, I recommend testing the solution in your handy local Python environment before running your SageMaker Pipeline.