I've defined a custom LLM metric with a couple of f-string formatted variables using the make_genai_metric_from_prompt()
function, and when I pass this custom metric to mlflow.evaluate()
with a pandas df, I keep getting this error:
mlflow.exceptions.MlflowException: Missing variable inputs to eval_fn: {'input', 'output'}
I'm unsure how to actually pass these variable inputs; my pandas df has the column names "input" and "output".
Some example code:
modified_x_metric = make_genai_metric_from_prompt(
name="x_bool",
judge_prompt="""...examples could be included below for reference. Make sure to use them as references and to
understand them before completing the task.
Input:
{input}
Output:
{output}
Metric and Grading Directions:
..."""
model="openai:/gpt-4o",
aggregations=[
"mean",
"median",
],
greater_is_better=True,
)
with mlflow.start_run() as run:
results = mlflow.evaluate(
data=eval_data,
predictions="output",
extra_metrics=[modified_x_metric],
#evaluator_config={"col_mapping": {"output": "output", "input": "input",}},
)
I've tried passing and not passing the evaluator_config argument (as this seems to make a difference for metrics created with the make_genai_metric()
function), but no difference. I would expect metrics created from make_genai_metric()
and make_genai_metric_from_prompt()
to function similarly, where you can define input columns in the evaluator_config
argument, but no dice.
Anyone else had any luck with this function?
There has been a bug that evaluation metrics created by make_genai_metric_from_prompt cannot be used for mlflow.evaluate. A PR (https://github.com/mlflow/mlflow/pull/16960) has been filed to fix the issue, which should be included in the next minor release. Sorry for the inconvenience, and thank you for the report. From now one, we would appreciate it if you could file an issue at https://github.com/mlflow/mlflow/issues when you find an issue. Thank you!