Metric Details in Generative AI

You can monitor OCI Generative AI resources through the metrics provided in this service. You can also use the OCI Monitoring service to create custom queries and alarms to notify you when these metrics meet alarm-specified triggers.

Hosting Dedicated AI Cluster Metrics

This section lists the metrics for the hosting dedicated AI clusters. The fine-tuning dedicated clusters don't display metrics.


Metric Display Name	Description
Utilization	The available capacity for a dedicated AI cluster displayed as percentage over time
Total number of input	Number of input tokens that the models on this hosting dedicated AI cluster have processed
Total number of output	Number of output tokens that the models on this hosting dedicated AI cluster have processed

You can get the preceding metrics from a hosting dedicated AI cluster's detail page.

Endpoint Metrics

This section lists the metrics for model endpoints in Generative AI.


Metric Display Name	Description
Total processing time	Total processing time for a call to finish
Number of calls	Number of input tokens that the model that's hosted on this endpoint has processed
Service Errors Count	Number of calls with a service internal error
Client Errors Count	Number of calls with a client side error
Total number of input	Number of input tokens that the model that's hosted on this endpoint has processed
Total number of output	Number of output tokens that the model that's hosted on this endpoint has processed
Success rate of calls	Successful calls divided by the total number of calls

You can get the preceding metrics from an endpoint's detail page.

Metrics for Custom Queries

You can create custom queries and alarms for the Generative AI cluster and endpoint metrics through the Monitoring service.

This section lists the parameters that you can use to create custom queries for Generative AI metrics by using the Monitoring service.


Metric Parameter	Display Name	Description
`ClientErrorCount`	Client Errors Count	Number of calls with a client side error
`InputTokenCount`	Total number of input	Number of input tokens that the models hosted on this resource have processed
`InvocationLatency`	Total processing time	Total processing time for a call to finish on this resource
`OutputTokenCount`	Total number of output	Number of output tokens that the models hosted on this resource have processed
`ServerErrorCount`	Service Errors Count	Number of calls with a service internal error
`TotalInvocationCount`	Number of calls	Number of calls

For the steps on how to create these custom queries, see Creating a Query for Generative AI Metrics.

Oracle Cloud Infrastructure Documentation

Metric Details in Generative AI

Hosting Dedicated AI Cluster Metrics

Endpoint Metrics

Metrics for Custom Queries