pythonazureazure-cognitive-servicesazure-sdk-python

How do I set up the default LLM_RAG_CRACK_AND_CHUNK_AND_EMBED setup for my existing services from a python script?


I see that there is a default setup for this here. How can I set this up for my existing services? Could anybody point to the right tutorial/template? I have the following piece of code:

from msrest import Configuration
from azure.identity import DefaultAzureCredential


# create configuration for LLM_RAG_CRACK_AND_CHUNK_AND_EMBED
conf = Configuration("azureml://registries/azureml/components/llm_rag_crack_and_chunk_and_embed/labels/default")
endpoint = "https://xyz.search.windows.net"
credential = DefaultAzureCredential()
# how do I proceed?

Solution

  • Use the code below to implement LLM_RAG_CRACK_AND_CHUNK_AND_EMBED in your pipeline.

    from azure.ai.ml import MLClient, Input, Output
    from azure.ai.ml.dsl import pipeline
    
    ml_client_registry = MLClient(credential=DefaultAzureCredential(), registry_name="azureml")
    chunk_data = ml_client_registry.components.get("LLM_RAG_CRACK_AND_CHUNK_AND_EMBED")
    
    @pipeline()
    def pipeline_with_registered_components(input, chunk):
        train_job = chunk_data(
            input_data=input,
            chunk_size=chunk
        )
        train_job.outputs['embeddings'] = Output(type="uri_folder", path="****/chunk_pdf/")
    
    pipeline_job = pipeline_with_registered_components(
        input=Input(type="uri_folder", path="****/pdf/"),
        chunk=256
    )
    pipeline_job.settings.default_compute = "jgs-cluster"
    print(pipeline_job)
    

    To execute:

    pipeline_job = ml_client.jobs.create_or_update(
        pipeline_job, experiment_name="pipeline_samples"
    )
    pipeline_job
    

    The above code is just an example. See the component definition in the Azure ML registry and pass the embedding model, embedding container, and all required parameters.

    Refer to this GitHub code for more information about building a pipeline.