RAG evaluation metrics for monitoring agents in AI Agent Studio

Agent evaluations now include Retrieval-Augmented Generation (RAG) metrics to assess the quality of answers generated using the document tool. These metrics provide deeper visibility into how effectively agents use retrieved context to generate accurate and grounded responses.

Key updates:

New evaluation metrics:
- Groundedness (Faithfulness) – Measures how well the answer aligns with retrieved content.
- Answer Relevance – Assesses how directly the answer addresses the user’s question.
- Context Relevance – Evaluates the quality and appropriateness of the retrieved context.
Uses the LLM-as-a-Judge methodology with fine-tuned, industry-standard evaluation templates.

You can view the RAG metrics from the Monitoring and Evaluation tab in AI Agent Studio.

The new evaluation metrics enhance transparency and quality measurement for RAG-based responses, enabling continuous improvement of agent retrieval and reasoning performance.

Steps to enable and configure

You must have access to use AI Agent Studio.

Key resources

Monitor and Evaluate AI Agents

Access requirements

See Access Requirements for AI Agent Studio