Metric Details in Generative AI

You can monitor OCI Generative AI resources through the metrics provided in this service. You can also use the OCI Monitoring service to create custom queries and alarms to notify you when these metrics meet alarm-specified triggers.

Hosting Dedicated AI Cluster Metrics

This section lists the metrics for the hosting dedicated AI clusters. The fine-tuning dedicated clusters don't display metrics.

Metric Display Name Description
Utilization The available capacity for a dedicated AI cluster displayed as percentage over time
Total number of input Number of input tokens that the models on this hosting dedicated AI cluster have processed
Total number of output Number of output tokens that the models on this hosting dedicated AI cluster have processed

You can get the preceding metrics from a hosting dedicated AI cluster's detail page.

Endpoint Metrics

This section lists the metrics for model endpoints in Generative AI.

Metric Display Name Description
Total processing time Total processing time for a call to finish
Number of calls Number of input tokens that the model that's hosted on this endpoint has processed
Service Errors Count Number of calls with a service internal error
Client Errors Count Number of calls with a client side error
Total number of input Number of input tokens that the model that's hosted on this endpoint has processed
Total number of output Number of output tokens that the model that's hosted on this endpoint has processed
Success rate of calls Successful calls divided by the total number of calls

You can get the preceding metrics from an endpoint's detail page.

Metrics for Custom Queries

You can create custom queries and alarms for the Generative AI cluster and endpoint metrics through the Monitoring service.

This section lists the parameters that you can use to create custom queries for Generative AI metrics by using the Monitoring service.

Metric Parameter Display Name Description
ClientErrorCount Client Errors Count Number of calls with a client side error
InputTokenCount Total number of input Number of input tokens that the models hosted on this resource have processed
InvocationLatency Total processing time Total processing time for a call to finish on this resource
OutputTokenCount Total number of output Number of output tokens that the models hosted on this resource have processed
ServerErrorCount Service Errors Count Number of calls with a service internal error
TotalInvocationCount Number of calls Number of calls

For the steps on how to create these custom queries, see Creating a Query for Generative AI Metrics.