I’m facing an issue while trying to create an MLTable YAML file for a dataset in Azure ML.
I have a default datastore in my workspace containing two folders (OK and NOK) with images. My goal is to read all images and use the folder name as the label for each image.
Here’s what I’ve tried so far:
mltable_yaml = """
type: mltable
paths:
- file: ./OK
- file: ./NOK
transformations:
- read_from_directory:
image_column: image_url
folder_column: label
recursive: true
"""
# Create directory and save MLTable
mltable_dir = "image_data"
os.makedirs(mltable_dir, exist_ok=True)
with open(os.path.join(mltable_dir, "MLTable"), "w") as f:
f.write(mltable_yaml)
training_data = Input(
type="mltable",
path=mltable_dir
)
However, when I run the experiment, I encounter the following error:
MLTable input is invalid. UserErrorException:
Message: Encountered user error while fetching data from Dataset. Error: UserErrorException:
Message: MLTable yaml schema is invalid:
Error Code: ScriptExecution.Validation
Validation Error Code: Invalid
Validation Target: Script
Native error: Dataflow script error: InvalidScriptElement("read_from_directory")
ScriptError(InvalidScriptElement("read_from_directory"))
=> Invalid script element "read_from_directory"
InvalidScriptElement("read_from_directory")
Error Message: Yaml script is invalid: InvalidScriptElement("read_from_directory").| session_id=1a30b15a-7e85-498b-b735-2348bfe0625b
InnerException None
ErrorResponse
{
"error": {
"code": "UserError",
"message": "MLTable yaml schema is invalid: \nError Code: ScriptExecution.Validation\nValidation Error Code: Invalid\nValidation Target: Script\nNative error: Dataflow script error: InvalidScriptElement(\"read_from_directory\")\n\tScriptError(InvalidScriptElement(\"read_from_directory\"))\n=> Invalid script element \"read_from_directory\"\n\tInvalidScriptElement(\"read_from_directory\")\nError Message: Yaml script is invalid: InvalidScriptElement(\"read_from_directory\").| session_id=1a30b15a-7e85-498b-b735-2348bfe0625b"
}
}
InnerException UserErrorException:
Message: MLTable yaml schema is invalid:
Error Code: ScriptExecution.Validation
Validation Error Code: Invalid
Validation Target: Script
Native error: Dataflow script error: InvalidScriptElement("read_from_directory")
ScriptError(InvalidScriptElement("read_from_directory"))
=> Invalid script element "read_from_directory"
InvalidScriptElement("read_from_directory")
Error Message: Yaml script is invalid: InvalidScriptElement("read_from_directory").| session_id=1a30b15a-7e85-498b-b735-2348bfe0625b
InnerException None
ErrorResponse
{
"error": {
"code": "UserError",
"message": "MLTable yaml schema is invalid: \nError Code: ScriptExecution.Validation\nValidation Error Code: Invalid\nValidation Target: Script\nNative error: Dataflow script error: InvalidScriptElement(\"read_from_directory\")\n\tScriptError(InvalidScriptElement(\"read_from_directory\"))\n=> Invalid script element \"read_from_directory\"\n\tInvalidScriptElement(\"read_from_directory\")\nError Message: Yaml script is invalid: InvalidScriptElement(\"read_from_directory\").| session_id=1a30b15a-7e85-498b-b735-2348bfe0625b"
}
}
ErrorResponse
{
"error": {
"code": "UserError",
"message": "Encountered user error while fetching data from Dataset. Error: UserErrorException:\n\tMessage: MLTable yaml schema is invalid: \nError Code: ScriptExecution.Validation\nValidation Error Code: Invalid\nValidation Target: Script\nNative error: Dataflow script error: InvalidScriptElement(\"read_from_directory\")\n\tScriptError(InvalidScriptElement(\"read_from_directory\"))\n=> Invalid script element \"read_from_directory\"\n\tInvalidScriptElement(\"read_from_directory\")\nError M
From the error details, it seems like the read_from_directory element is not recognized, but I’m unsure how to structure the YAML to correctly map the folder name to the label.
How to resolve this?
There is no read_from_directory
tranformantion schema in MLTable, check this documentation.
For AutoML image classification you need data in .jsonl
file with below fields, check this documentation/
{
"image_url":"azureml://subscriptions/<my-subscription-id>/resourcegroups/<my-resource-group>/workspaces/<my-workspace>/datastores/<my-datastore>/paths/<path_to_image>",
"image_details":{
"format":"image_format",
"width":"image_width",
"height":"image_height"
},
"label":"class_name",
}
image_url
and label
are required fields, also you need to give image url as complete datastore path.
Follow below steps to create jsonl
file.
First, you need datastore path to each image so you create new data asset and take the path.
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes, InputOutputModes
from azure.ai.ml import Input
from azure.identity import DefaultAzureCredential
from azure.ai.ml import MLClient
credential = DefaultAzureCredential()
ml_client = MLClient.from_config(credential)
my_data = Data(
path="./images",
type=AssetTypes.URI_FOLDER,
description="Fridge-items images",
name="items-images",
)
uri_folder_data_asset = ml_client.data.create_or_update(my_data)
Here, i am having OK
and NOK
folders inside images
.
You will get path in uri_folder_data_asset.path
.
Next create jsonl
file using below code.
import os
import json
folders = {
"OK": "./images/OK",
"NOK": "./images/NOK"
}
mltable_dir = "image_data_mltable"
os.makedirs(mltable_dir, exist_ok=True)
output_file = "./image_data_mltable/image_data.jsonl"
with open(output_file, "w") as jsonl_file:
for label, folder_path in folders.items():
for file_name in os.listdir(folder_path):
if file_name.lower().endswith((".jpg", ".jpeg", ".png", ".bmp", ".gif")):
record = {
"image_url": os.path.join(folder_path.replace('./images/',uri_folder_data_asset.path), file_name).replace("\\", "/"),
"label": label
}
jsonl_file.write(json.dumps(record) + "\n")
print(f"JSONL file created: {output_file}")
and create mltable file.
mltable_yaml = """
paths:
- file: ./image_data.jsonl
transformations:
- read_json_lines:
encoding: utf8
invalid_lines: error
include_path_column: false
- convert_column_types:
- columns: image_url
column_type: stream_info
"""
with open(os.path.join(mltable_dir, "MLTable"), "w") as f:
f.write(mltable_yaml)
Use read_json_lines
in transformation, check this on how to prepare image data.
Output:
Now use it as input.
import mltable
training_data = Input(type=AssetTypes.MLTABLE, path="./image_data_mltable")
tbl = mltable.load(uri="./image_data_mltable")
tbl.to_pandas_dataframe()
You refer this sample github documentation for AutoML classification to know more about it.