Compute Management in Autonomous AI Database on Dedicated Exadata Infrastructure

Autonomous AI Database on Dedicated Exadata Infrastructure offers two compute models while configuring your Autonomous AI Database resources. They are:

ECPU: An ECPU is an abstracted measure of compute resources. ECPUs are based on the number of cores elastically allocated from a pool of compute and storage servers. You need at least 2 ECPUs to provision an Autonomous AI Database.

While provisioning a new database, cloning an existing database, and scaling up or down the CPU resources of an existing database, the CPU count defaults to 2 ECPUs, in increments of 1. For example, the next available number of ECPUs above 2 is 3.

You can create Autonomous AI Database for Developers instances on ECPU based container databases. They are free Autonomous AI Databases that developers can use to build and test new applications. See Autonomous AI Database for Developers for more details.
OCPU: An OCPU is a physical measure of compute resources. OCPUs are based on the physical core of a processor with hyper-threading enabled.

Note: OCPU is a legacy billing metric and has been retired for Autonomous AI Database on Dedicated Exadata Infrastructure. Oracle recommends using ECPUs for all new and existing Autonomous AI Database deployments. See Oracle Support Document 2998755.1 for more information.

While provisioning a new database, cloning an existing database, and scaling up or down the CPU resources of an existing database:
- The CPU count defaults to 1 OCPU, in increments of 1. For example, the next available number of OCPUs above 1 is 2.
- For databases that do not need an entire OCPU, you can assign OCPUs from 0.1 to 0.9 in increments of 0.1 OCPUs. This allows you to overprovision CPU and run more databases on each infrastructure instance. Refer to CPU Overprovisioning for more details.

The Autonomous Exadata VM Cluster’s compute type applies to all its Autonomous Container Databases and Autonomous AI Database instances.

Compute Management

Autonomous AI Database instances are deployed into an Autonomous Exadata VM Cluster (AVMC) and into one of its child Autonomous Container Databases (ACD). Exadata Infrastructures are capable of running multiple AVMCs. The CPUs that you allocate while provisioning an AVMC resource will be the total CPUs available for its Autonomous AI Databases. When you create multiple AVMCs, each AVMC can have its own value for total CPUs.

Multiple VM Autonomous Exadata VM Cluster is not available on any Oracle Public Cloud deployment of Exadata Infrastructure (EI) resources created before the launch of the Multiple VM Autonomous AI Database feature. For X8M generation and above Exadata Infrastructure resources created after the Multiple AVMC feature launch, each AVMC is created with one cluster node for each of the servers of the Exadata system shape you choose. For information about constraining these total CPUs across different Groups of users, see How Compartment Quotas Affect CPU Management.

Note: The maximum number of AVMC and ACD resources you can create on a given Exadata Infrastructure varies based on the generation of hardware. Please refer to Resource Limits and Characteristics of Infrastructure Shapes for details on constraints for each generation.

At an AVMC or ACD level, the total number of CPUs available for creating databases is called available CPUs. At the AVMC resource level, available CPUs will be equal to the total CPUs until you create the first ACD. Once you create an ACD,8 ECPUs or 2 OCPUs per node are allocated to the new ACD from the AVMC’s available CPUs. So, the available CPUs at the AVMC resource level reduces accordingly. When you create the first Autonomous AI Database in that ACD, the new database consumes the initially allocated CPUs (8 ECPUs or 2 OCPUs per node). If the new database needs more than 8 ECPUs or 2 OCPUs, they get assigned from the parent AVMC’s available CPUs, there by reducing the available CPUs at the parent AVMC level. As you create more ACDs and provision Autonomous AI Databases within each ACD, the available CPU value changes accordingly.

Available CPUs at the Autonomous Exadata VM Cluster level applies to all its Autonomous Container Databases. This count of CPUs available to the container database becomes important if you are using the auto-scaling feature, as described in CPU Allocation When Auto-Scaling.

Similarly, when you manually scale the CPUs of an Autonomous AI Database up, CPUs are consumed from the available CPUs at its parent AVMC level and its value changes accordingly.

When you create an Autonomous AI Database, by default Oracle reserves additional CPUs to ensure that the database can run with at least 50% capacity even in case of any node failures. You can change the percentage of CPUs reserved across nodes to 0% or 25% while provisioning an ACD. See Node failover reservation in Create an Autonomous Container Database for instructions. These additional CPUs are not included in the billing.

When an Autonomous AI Database is running, you are billed for the number of CPUs currently allocated to the database, whether specified at initial creation or later by a manual scaling operation. Additionally, if auto-scaling is enabled for the database, you are billed for each second for any additional CPUs the database is using as the result of being automatically scaled up. See CPU Billing Details for more information about how billing is measured and computed.

When an Autonomous AI Database is stopped, you are not billed. However, the number of CPUs allocated to it are not returned to the available CPUs at its parent AVMC level for the overall deployment.

When an Autonomous AI Database is terminated or scaled down, the number of CPUs allocated to it are not immediately returned to the available CPUs at its parent AVMC level for the overall deployment. They continue to be included in the count of CPUs available to its parent container database until that parent container database is restarted. These CPUs are called reclaimable CPUs. Reclaimable CPUs at the parent AVMC level is the sum of reclaimable CPUs of all its ACDs. When an ACD is restarted, it returns all its reclaimable CPUs to the available CPUs at its parent AVMC level.

Restarting an Autonomous Container Database (ACD) is an online operation, done in a rolling manner across the cluster, and will not result in application downtime if configured according to best practices to use Transparent Application Continuity.

Tip: You can track the different compute (CPU) attributes discussed in this article from the Details page of an Autonomous Exadata VM Cluster (AVMC) or Autonomous Container Database (ACD). For guidance, refer to Resource Usage Tracking.

CPU Allocation When Auto-Scaling

The auto-scaling feature enables an Autonomous AI Database to use up to three times more CPU and IO resources than its allocated CPU count. In case of CPU overprovisioning, if three times the CPU count results in a value less than 1, it will be rounded to the next whole number. CPU overprovisioning is supported with OCPUs only. See CPU Overprovisioning for more details.

To ensure that no single Autonomous AI Database can auto-scale up to consume all CPUs available in the pool for the overall deployment, Oracle Autonomous AI Database on Dedicated Exadata Infrastructure uses the Autonomous Container Database as a limiting control.

While provisioning an auto-scaling enabled Autonomous AI Database in an ACD, if the available CPUs in that ACD is less than 3X CPU value of the new database, then additional CPUs will be reserved in that ACD. These CPUs are called reserved CPUs. Reserved CPUs ensure that the available CPUs at an ACD level are always greater than or equal to 3x CPU value of the largest auto-scaling enabled database in that ACD. These reserved CPUs can still be used to create or manually scale Autonomous AI Databases in this ACD.

When automatically scaling up an Autonomous AI Database, Oracle Autonomous AI Database on Dedicated Exadata Infrastructure looks for idle CPUs in its parent container database. If idle CPUs are available, the Autonomous AI Database is scaled up; otherwise, it is not. Databases inherently have a lot of idle time, so auto-scaling is a way to maximize resource usage while controlling costs and preserving good isolation from databases in other Autonomous Container Databases.

If the CPU used to auto-scale an Autonomous AI Database came from another running Autonomous AI Database that is lightly loaded and so not is using all of its allocated CPUs, Oracle Autonomous AI Database on Dedicated Exadata Infrastructure automatically scales the auto-scaled database down if the load increases on the other database and it needs its allocated CPU back.

Consider the example of an Autonomous Container Database hosting four running 4-CPU Autonomous AI Databases, all with auto-scaling enabled. The count of CPUs available to the container database for auto-scaling purposes is 12. Should one of these databases need to be auto-scaled past 4 CPUs due to load increase, Oracle Autonomous AI Database on Dedicated Exadata Infrastructure will only perform the auto-scaling operation if one or more of the other databases are lightly loaded and not using all allocated CPUs. The billing cost of this example is 16 CPUs at a minimum because all four 4-CPU databases are always running.

By contrast, consider the example of an Autonomous Container Database hosting four running 2-CPU Autonomous AI Database, all with auto-scaling enabled, and one stopped 8-CPU Autonomous AI Database. The count of CPUs available to the container database for auto-scaling purposes is again 16. Should one of the running databases need to be auto-scaled due to load increase past 2 CPUs, Oracle Autonomous AI Database on Dedicated Exadata Infrastructure can perform the operation using CPUs allocated to the stopped 8-CPU database. In this example, the four running databases can consume up to a total of 8 additional CPUs simultaneously without consuming each other’s allocated CPUs. The billing cost of this example is only 8 CPUs at a minimum because only the four 2-CPU databases are always running.

For any Autonomous Data Guard service instance, local or cross-region, the additional pricing will be the number of ECPUs or OCPUs you reserved when you created or explicitly scaled your primary service instance, regardless of whether auto scaling is enabled or not. Auto scaling-related ECPU or OCPU consumption on primary service instances does not occur on Autonomous Data Guard Standby service instances.

How Compartment Quotas Affect CPU Management

Normally, when you create or scale up an Autonomous AI Database, the ability of Oracle Autonomous AI Database on Dedicated Exadata Infrastructure to satisfy your request depends only on the availability of unallocated CPUs in the single pool of CPUs across the entire deployment.

However, you can use the compartment quotas feature of Oracle Cloud Infrastructure to further restrict, on a compartment by compartment basis, the number of CPUs available to create, manually scale and auto-scale Autonomous AI Databases of each workload type (Autonomous AI Lakehouse or Autonomous AI Transaction Processing) individually.

In brief, you use the compartment quotas feature by creating set, unset and zero policy statements to limit the availability of a given resource in a given compartment. For detailed information and instructions, see Compartment Quotas.

How VM Cluster Nodes Affect CPU Management

The preceding discussion of CPU management and allocation states that you can create multiple Autonomous Exadata VM Cluster (AVMC) resources by choosing the CPU count per node while provisioning the AVMC resource.

This section will discuss granular details about how Oracle Cloud Infrastructure places Autonomous AI Databases in the VM cluster nodes, and the consequences of such placement on auto-scaling and parallel processing.

The following attributes determine when and how an Autonomous AI Database is placed across multiple nodes:

Split Threshold: The CPU value beyond which Oracle Cloud Infrastructure opens an Autonomous AI Database across multiple nodes. The default split threshold is 64 for ECPUs and 16 for OCPUs, but if VM Clusters are created with CPU node counts below the default value, then the default is overridden to the VM Cluster node count size. You can also set the split value explicitly using the Split Threshold attribute while provisioning an Autonomous Container Database (ACD).

Autonomous AI Databases created with a CPU value that is smaller than the split value will open on one node in the cluster and those created with a CPU value larger than the split threshold value will open on multiple nodes.
- Suppose you create an ACD with a default split threshold (64 ECPUs) in an AVMC with two nodes and 40 ECPUs per node. As 40 is smaller than 64, any Autonomous AI Database with a CPU requirement greater than 40 will be split and opened across multiple nodes, allowing DML requests across those nodes. However, if the AVMC was created with two nodes and 80 ECPUs per node, any database with an ECPU requirement greater than 64 will be split and opened across multiple nodes.
- Suppose you create an ACD in a VM Cluster with two nodes and 40 ECPUs per node and explicitly set the split threshold value to a much smaller value, say 20 ECPUs. Any Autonomous AI Database with a CPU requirement greater than 20 will be split and opened across multiple nodes, and databases with a CPU requirement of less than 20 will be opened on a single node.
  
  Setting the split threshold to a much smaller number than the default value increases the chances of databases with smaller CPU counts opening on multiple nodes, as long as their CPU count is more than the set split value. Whenever a database is created or scaled to a size greater than this split value, it gets opened on multiple nodes. This is useful when you want databases to open on multiple nodes to control performance degradation in case of a node failure or planned maintenance. With databases split across multiple nodes in larger RAC clusters, if any one node fails or when scheduled maintenance occurs, you can continue to have higher performance rather than degrading to a 50% performance profile.
- Suppose you explicitly set the split threshold to a value much higher than default, say 80 ECPUs, in an AVMC with two nodes and 40 ECPUs. Any Autonomous AI Database with a CPU requirement greater than 40 will be split and opened across multiple nodes, and databases with a CPU requirement of less than 40 will be opened on a single node.
  
  Setting the split threshold to a value much higher than the default causes your database DML to stay on a single RAC node and eliminate the chance of cluster wait contention.
- When you manually scale an Autonomous AI Database, the new CPU value will be applied to the existing Split model. That is, if the new value is smaller than the split threshold, it will open on one node, and if the value is greater than the split threshold, it will open on multiple nodes.
Distribution Affinity: Determines the number of nodes on which an Autonomous AI Database will be opened once it crosses the Split Threshold.

For example, suppose you created an AVMC resource with 4 nodes and 80 ECPUs per node, and you created an ACD in this AVMC with the database Split Threshold set to 64. Creating an Autonomous AI Database with an ECPU requirement of 120 will split and open the database across multiple nodes as 120 greater than 64 (Split Threshold).
- If your distribution affinity is set to Minimum nodes, Oracle Cloud Infrastructure tries to create the database on 2 nodes with 60 ECPUs on each node. If this is not possible, it will be split across 3 nodes with 40 ECPUs each. If that is also not possible, then Oracle Cloud Infrastructure will try to open the database across 4 nodes with 30 ECPUs each.
- If you specify distribution affinity to Maximum nodes, Oracle Cloud Infrastructure tries to create the database split across all 4 nodes with 30 ECPUs each. If this is not possible, it will be split across three nodes with 40 ECPUs each. If that is also not possible, then Oracle Cloud Infrastructure will try to open the database across 2 nodes with 60 ECPUs each.
Node Failover Reservation (%): The number of CPUs set aside on adjacent nodes (nodes where your database software is present but not open) in your AVMC for localized failure and maintenance events. Node Failover Reservation applies to non-Split database deployments. By default, there is a 50% reserve, meaning during a failure event or maintenance, you will continue to run but at 50% of the allocated CPU.
- For non-critical databases or databases with very light utilization, you can set Node Failover Reservation to a smaller value so that, in the end, you can create and consolidate a larger number of databases on your Dedicated Exadata Infrastructure.
- You can set this value to zero for development environment and databases where a downtime during maintenance is acceptable.
- To some extent, Node Failover Reservation can also be controlled by ensuring a database is split across more than 2 nodes, using split threshold and distribution affinity. Consider a scenario where an Autonomous AI Database is split across 4 nodes. By removing one node at a time in a rolling fashion while a maintenance activity is in progress, you always have 3 nodes still up and taking traffic, keeping your performance reserve effectively 75%, rather than the usual 50%. With larger clusters, you can drive this reserve up even further, say to an 87.5% reserve on an 8-node cluster.

How an Autonomous AI Database’s CPU allocation is distributed across VM cluster nodes affects the following operations:

Auto-scaling:
- Auto-scaling may occur within a single VM cluster node for non-parallelizable DML and across VM Cluster nodes if the DML is parallelizable.
- Multiple concurrent session with non-parallelizable queries may be routed to all nodes in the cluster, effectively allowing auto-scaling across all nodes in a multi-node database.
Parallel Processing:
- Parallel processing of SQL statements occurs within Autonomous Exadata VM cluster nodes that are open, first within a single node, and then in adjacent open nodes, which as discussed above will depend on the size of the Autonomous Exadata VM Cluster.

Based on the resource utilization on each node; not all the values of the available CPUs can be used to provision or scale Autonomous AI Databases. For example, suppose you have 20 CPUs available at the AVMC level, not all the values from 1 to 20 CPUs can be used to provision or scale Autonomous AI Databases depending on the resource availability at the node level. The list of CPU values that can be used to provision or scale an Autonomous AI Database is called provisionable CPUs.

When you try to provision or scale an Autonomous AI Database from the OCI console, the CPU field will give you a dropdown with the list of provisionable CPUs. Alternatively, you can use the following APIs to get the list of provisional CPU values:

GetAutonomousContainerDatabase returns a list of provisionable CPU values that can be used to create a new Autonomous AI Database in the given Autonomous Container Database. See GetAutonomousContainerDatabase for more details.
GetAutonomousDatabase returns a list of provisionable CPU values that can be used for scaling a given Autonomous AI Database. See GetAutonomousDatabase for more details.

Compute Management in Autonomous AI Database on Dedicated Exadata Infrastructure