I see that there is a default setup for this here. How can I set this up for my existing services? Could anybody point to the right tutorial/template? I have the following piece of code:
from msrest import Configuration
from azure.identity import DefaultAzureCredential
# create configuration for LLM_RAG_CRACK_AND_CHUNK_AND_EMBED
conf = Configuration("azureml://registries/azureml/components/llm_rag_crack_and_chunk_and_embed/labels/default")
endpoint = "https://xyz.search.windows.net"
credential = DefaultAzureCredential()
# how do I proceed?
Use the code below to implement LLM_RAG_CRACK_AND_CHUNK_AND_EMBED in your pipeline.
from azure.ai.ml import MLClient, Input, Output
from azure.ai.ml.dsl import pipeline
ml_client_registry = MLClient(credential=DefaultAzureCredential(), registry_name="azureml")
chunk_data = ml_client_registry.components.get("LLM_RAG_CRACK_AND_CHUNK_AND_EMBED")
@pipeline()
def pipeline_with_registered_components(input, chunk):
train_job = chunk_data(
input_data=input,
chunk_size=chunk
)
train_job.outputs['embeddings'] = Output(type="uri_folder", path="****/chunk_pdf/")
pipeline_job = pipeline_with_registered_components(
input=Input(type="uri_folder", path="****/pdf/"),
chunk=256
)
pipeline_job.settings.default_compute = "jgs-cluster"
print(pipeline_job)
To execute:
pipeline_job = ml_client.jobs.create_or_update(
pipeline_job, experiment_name="pipeline_samples"
)
pipeline_job
The above code is just an example. See the component definition in the Azure ML registry and pass the embedding model, embedding container, and all required parameters.
Refer to this GitHub code for more information about building a pipeline.