amazon-sagemaker

Get Sagemaker pipeline execution id within pipeline steps


Hello I thought my problem is simple but trying to google for the answer showed me something else: Within different Sagemaker Pipeline Steps (e.g ClarifyCheckStep) I want to get the Pipeline execution id so I can save the output of different steps in a nice manner and structure the saving of my output. Does anyone have an idea? Pipeline execution variables cannot be used in string format it seems: https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#execution-variables


Solution

  • In order to save outputs following a certain structure, having in common the execution of the pipeline, the most robust method currently present is to use the code_location and output_path parameters of the various steps by previously creating a path that has the pipeline_name and possibly other details with a timestamp that guarantees its uniqueness.

    Then, when you get your pipeline definition (e.g., with a get_pipeline() function), you can pass the pipeline_name and other variables. An example is as follows:

    import time
    
    pipeline = your_pipeline_script.get_pipeline(
        region = region,
        role = role,
        pipeline_name = your_pipeline_name,
        pipeline_detail = some_details + "-" + time.strftime("%Y%m%d%H%M%S", time.gmtime()),
        )
    

    your output destination may become something like this:

    outputs_destination = f"s3://{pipeline_session.default_bucket()}/pipeline/{pipeline_name}/{pipeline_detail}"
    

    This way is your path is pregenerated before the pipeline is executed and is controllable with whatever parameter you want to enter.

    One idea might be to create subfolders that have names of some particular parameter. The important thing is that it follows a well-defined and easily recognizable structure.