7.1 Resolve Noisy Neighbor Issues

AHF Balance is a command-line utility that analyzes historical CPU consumption data and Database Resource Manager (DBRM) settings for the set of databases running in a cluster.

It assists in understanding the history of CPU-based noisy neighbor problems and recommends appropriate DBRM settings to minimize the risk of noisy neighbor problems.

AHF Balance queries CPU consumption from Oracle Enterprise Manager's repository database. Before you can generate AHF Balance reports, you need to configure a connection to the Oracle Enterprise Manager repository. For more information, see ahf configuration.

Related Topics

7.1.1 CPU-Based Noisy Neighbor Prevention Strategies

7.1.1.1 Partitioned – an MAA Best Practice

When a cluster is partitioned, each database instance has dedicated CPU capacity. CPU consumption by neighbors cannot interfere with a database instance. CPU resources (up to a configured limit - CPU_COUNT) are guaranteed to be available at all times. However, since CPU resources are dedicated to specific database instances, instances cannot take advantage of (borrow) CPU cycles that are not being used by other instances. Typically, when a cluster is partitioned, the degree of database consolidation is limited by the number of physical CPUs on each machine in the cluster, and the peak CPU consumption of each database hosted on the cluster.

A cluster is partitioned when the sum of the CPU_COUNT DBRM parameter values for all the database instances running on each machine in the cluster is less than or equal to the number of physical CPUs on the machine. For example, if the machines in a cluster each have 64 CPUs, and each machine is hosting 4 database instances, each with CPU_COUNT set to 16, the cluster is partitioned.

If the goal is to partition a cluster, then appropriate CPU_COUNT settings can be determined by analyzing historical CPU consumption data. AHF Balance supports this analysis.

7.1.1.2 Risk Management – supported by AHF Balance

When a cluster is hosting more databases than partitioning allows, it is said to be over-provisioned. When a cluster is over-provisioned, it is possible for high CPU consumption by one or more database instances to interfere with the CPU needs of another database instance: that database instance is suffering from noisy neighbors. It is also possible that databases sharing the cluster each need large amounts of CPU at different times, so that at no point in time is any database starved for CPU resources. Since the cluster is not partitioned, this is not guaranteed: the DBRM is not configured to prevent the situation where all the databases need large amounts of CPU simultaneously.

By analyzing historical CPU consumption, AHF Balance can recommend CPU_COUNT settings that minimize the amount of time where each database is exposed to high CPU consumption by its neighbors, if the historical record shows that partitioning is not possible.

7.1.1.3 Terms Associated with AHF Balance

  • Limit: The maximum number of vCPUs a database instance may use simultaneously. The DBRM parameter CPU_COUNT implements a limit for the instance.
  • Guarantee: The number of vCPUs a database instance is guaranteed to be able to use at any time. When a cluster is dedicated to running databases, the DBRM and the operating system cooperate to provide a guarantee.

    If the over-provisioning ratio R=sum(CPU_COUNT)/physical vCPUs, then the guarantee for a database instance is its CPU_COUNT/R.

    For example, if we had a 64 vCPU machine running 8 database instances, all with CPU_COUNT set to 16, then the oversubscription ratio R would be 2, that is, 8 * 16 / 64, and each individual database instance would have a guarantee of 8, that is, 16/2.

  • Not Exposed Hour: An hour when no database instance's CPU use exceeds its CPU guarantee. When an instance is not exposed, it cannot experience CPU-based noisy neighbor problems regardless of the CPU consumption of the other instances running on the machine.
  • Exposed Hour: An hour when one or more database instance's CPU use exceeds its CPU guarantee. When an instance is exposed, it may experience noisy neighbor problems depending on the CPU consumption of the other instances running on the machine.
  • Impacted Hour: An exposed hour, when the host's CPU utilization exceeded 70% during the hour. When an instance is impacted, it is likely to be experiencing noisy neighbor problems because the total CPU consumption of the machine is high.

7.1.2 AHF Balance Reports

The number of entities (clusters, databases, and fleet) being considered in any given report will influence the time to generate the report

Cluster

The Cluster Report provides recommended CPU_COUNT settings for all the databases running in a cluster, based on the last month of CPU utilization history for those databases. Tables and graphs in the report show historical exposure and impact for the last month, and what the exposure and impact would have been if the recommended CPU_COUNT settings had been in place. This information is provided at both the host level and the database level.

Fleet

The Fleet Report summarizes the Cluster Reports for a fleet of clusters, showing which clusters would benefit most from the recommendations.

Database

The Database Report shows the details of the effects of cluster-wide adoption of recommended CPU_COUNT settings on all the instances of an individual database. This report is intended to facilitate a conversation between the owner of a cluster and the database administrator for an individual database. Note that it is not possible to recommend CPU_COUNT settings for an individual database. This report shows the effects on an individual database if all the databases running in the cluster adopt the recommendations.

7.1.3 Guided Resolution of Database Performance Problems Caused by Noisy Neighbors

AHF Balance no-longer requires a GI Home and now works with any Oracle Home.

Database CPU use is limited by the database CPU_COUNT parameter. When these limits add up to more than the number of CPUs on a machine, noisy-neighbor problems are possible.

AHF Balance analyzes database CPU configuration and historical CPU usage data from Enterprise Manager. The high-level results of this analysis are shown in the Oracle Orachk / Oracle Exachk MAA Score Card.

Further reports can be run to:

  • Get an overview of possible noisy neighbors across the fleet.
  • See detailed information about a specific database.
  • Generate a corrective action plan.
To use AHF Balance:
  • Configure AHF Balance to analyze historical CPU usage from Enterprise Manager’s repository database:
    ahf configuration set --type impact --connect-string <EM-DATABASE-CONNECT-STRING> --user-name <USER-NAME>

    Note:

    Ensure that the connect string does not contain any spaces.
  • Run a fleet-wide analysis to create a detailed AHF Balance report to understand noisy neighbors and the improvements possible by changing CPU_COUNT settings:
    ahf analysis create --type impact --scope fleet --name <FLEET_NAME>
  • Run a cluster-level analysis to get a detailed corrective action plan:
    ahf analysis create --type impact --scope cluster --name cluster_name

For more information, see Data Source.

7.1.4 Data Source

AHF Balance relies on CPU consumption data collected and stored by Enterprise Manager (EM). EM collects hourly CPU consumption for each database instance and each host it is managing. The default retention policy for hourly data collected by EM is 32 days.

Figure 7-1 Status Timeline

Timeline

Figure 7-2 Action Plan

Action Plan