Model Evaluation#
Evaluate a model against ground truth, surface mAP and per-class metrics, and explore failure modes interactively in the Model Evaluation panel — all from a single natural language prompt.

Requirements#
Usage#
Ask your AI assistant:
"Evaluate my model using ground_truth as ground truth and predictions as predictions"
"Show me where the model fails on the person class"
"Find high-confidence false positives"
"Compare my two model evaluations side by side"
"How does the model perform on small objects?"
The skill checks for existing evaluation results first and skips computation if already done. For new evaluations it runs a delegated background job, then opens the Model Evaluation panel and navigates directly to the results. For failure mode analysis it discovers the dataset structure dynamically — label field names, eval result fields, and class labels — before applying any filters.