|Oracle® Database 2 Day + Real Application Clusters Guide
10g Release 2 (10.2)
|PDF · Mobi · ePub|
Performance tuning for an Oracle Real Application Clusters (Oracle RAC) database is very similar to performance tuning for a single-instance database. Many of the tuning tasks that you perform on single-instance Oracle databases can also improve performance of your Oracle RAC database. This chapter focuses on the performance tuning and monitoring tasks that are unique to Oracle RAC. For information about general performance tuning, refer to Oracle Database 2 Day DBA.
This chapter includes the following sections:
See Also:Oracle Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide for more information about diagnosing problems for Oracle Clusterware and Oracle Real Application Clusters components
Oracle RAC contains a set of underlying views that are maintained by the database and accessible to the database administrator user
SYS. These views are called dynamic performance views because they are continuously updated while a database is running and in use, and their contents relate primarily to performance.
Although these views appear to be standard database tables, they are not. These views provide data on internal disk structures and memory structures. You can select from these views, but you can never update or alter them.
Each instance has a set of instance-specific views, which are prefixed with
V$. You can also query global dynamic performance views to retrieve performance information from all the qualified instances. Global dynamic performance view names are prefixed with
GV$ view retrieves the
V$ view information from all qualified instances. In addition to the
V$ information, each
GV$ view contains an extra column named
INST_ID of data type
INST_ID column displays the instance number from which the associated
V$ view information was obtained.
INST_ID column can be used as a filter to retrieve
V$ information from a subset of available instances. For example, the following query retrieves the information from the
V$LOCK view for instances 2 and 5:
SQL> SELECT * FROM GV$LOCK WHERE INST_ID = 2 OR INST_ID = 5;
Both Oracle Enterprise Manager Database Control and Oracle Enterprise Manager Grid Control are cluster-aware and provide a central console to manage your cluster database. From the Cluster Database page, you can do all of the following:
View the overall system status, such as the number of nodes in the cluster database and their current status. This high-level view capability means that you do not have to access each individual database instance for details if you just want to see inclusive, aggregated information.
View alert messages aggregated across all the instances with lists for the source of each alert message. An alert message is an indicator that signifies that a particular metric condition has been encountered. A metric is a unit of measurement used to report the system's conditions.
Monitor performance metrics aggregated across all the instances or displayed side by side so you can readily compare instances.
Monitor cluster cache coherency statistics to help you identify processing trends and optimize performance for your Oracle RAC environment. Cache coherency statistics measure how well the data in caches on multiple instances is synchronized. If the data caches are completely synchronized with each other, then reading a memory location through the cache on any instance will return the most recent data written to that location through any cache on any instance.
Enterprise Manager accumulates data over specified periods of time, called collection-based data. Enterprise Manager also provides current data, known as real-time data. The following sections explain how to monitor both types of data:
The Cluster Database Performance page provides a quick glimpse of the performance statistics for a database. Statistics are rolled up across all the instances in the cluster database. Using links that are next to the charts at the bottom of the page, you can get more specific information, allowing you to perform any of the following tasks:
Identify the causes of performance issues.
Decide whether resources need to be added or redistributed.
Tune your SQL plan and schema for better optimization.
Resolve performance issues.
The following screenshot shows a partial view of the Cluster Database Performance page. You access this page by clicking the Performance tab from the Cluster Database Home page.
The Cluster Host Load Average chart in the Cluster Database Performance page shows potential problems that are outside the database. The chart shows maximum, average, and minimum load values for available hosts for the previous hour.
If the load average is higher than the average of the total number of CPUs across all the hosts in the cluster, then too many processes are waiting for CPU resources. SQL statements that are not tuned often cause high CPU usage. Compare the load average values with the values displayed for CPU Used in the Average Active Sessions chart. If the sessions value is low and the load average value is high, then this indicates that something else on the host, other than your database, is consuming the CPU.
Each cluster database instance has its own buffer cache in their System Global Areas (SGAs). Using Cache Fusion, Oracle RAC environments logically combine each instance's buffer cache to enable the database instances to process data as if the data resided on a logically combined, single cache.
When a process attempts to access a data block, it first tries to locate a copy of the data block in the local buffer cache. If a copy of the data block is not found in the local buffer cache, a global cache operation is initiated. Before reading a data block from disk, the process attempts to find the data block in the buffer cache of another instance. If the data block is in the buffer cache of another instance, Cache Fusion transfers a version of the data block to the local buffer cache, rather than having one database instance write the data block to disk and requiring the other instance to reread the data block from disk. For example, after the
sales1 instance loads a data block into its buffer cache, the
sales2 instance can more quickly acquire the data block from the
sales1 instance by using Cache Fusion rather than by reading the data block from disk.
The Global Cache Block Access Latency chart shows data for two different types of data block requests: current and consistent-read (CR) blocks. When you update data in the database, Oracle Database must locate the most recent version of the data block that contains the data, which is called the current block. If you perform a query, only data committed before the query began is visible to the query. Data blocks that were changed after the start of the query are reconstructed from data in the undo segments, and the reconstructed data is made available to the query in the form of a consistent-read block.
The Global Cache Block Access Latency chart on the Cluster Database Performance page shows the latency for each type of data block request, or the elapsed time it takes to locate and transfer each consistent read and current blocks between the buffer caches.
If the Global Cache Block Access Latency chart shows high latencies, this can be caused by any of the following:
A high number of requests caused by SQL statements that are not tuned.
A large number of processes in the queue waiting for CPU, or scheduling delays.
Slow, busy, or faulty interconnects. In these cases, check your network connection for dropped packets, retransmittals, or cyclic redundancy check (CRC) errors.
Concurrent read and write activity on shared data in a cluster is a frequently occurring activity. Depending on the service requirements, this activity does not usually cause performance problems. However, when global cache requests cause a performance problem, optimizing SQL plans and the schema to improve the rate at which data blocks are located in the local buffer cache, and minimizing I/O is a successful strategy for performance tuning. If the latency for consistent-read and current block requests reaches 10 milliseconds, then your first step in resolving the problem should be to go to the Cluster Cache Coherency page for more detailed information.
To access the Cluster Cache Coherency page, click Cluster Cache Coherency in the Additional Monitoring Links section of the Cluster Database Performance page. You can alternatively click either of the legends to the right of the Global Cache Block Access Latency chart.
The Cluster Cache Coherency page appears, as shown in the following screenshot. This page contains summary charts for cache coherency metrics for the cluster.
Table 8-1 provides a description of the Cluster Cache Coherency charts and the actions to perform to access more comprehensive information for problem resolution.
Global Cache Block Access Latency
Shows the total elapsed time, or latency, for a block request. Click one of the legends to the right of the chart to view the average time it takes to receive data blocks for each block type (current or CR) by instance. On the Average Block Receive Time by Instance page, you can click an instance legend under the chart to go to the Block Transfer for Local Instance page, where you can identify which block classes, such as undo blocks, data blocks, and so on, are subject to intense global cache activity. This page displays the block classes that are being transferred, and which instances are transferring most of the blocks.Cache transfer indicates how many current and CR blocks per block class were received from remote instances, including how many transfers incurred a delay (busy) or an unexpected longer delay (congested).
Global Cache Block Transfer Rate
Shows the total aggregated number of blocks received by all instances in the cluster by way of an interconnect. Click one of the legends to the right of the chart to go to the Global Cache Blocks Received by Instance page for that type of block. From there, you can click an instance legend under the chart to go to the Segment Statistics by Instance page, where you can see which segments are causing cache contention.
Global Cache Block Transfers and Physical Reads
Shows the percentage of logical read operations that retrieved data from the buffer cache of other instances by way of Direct Memory Access and from disk. It is essentially a profile of how much work is performed in the local buffer cache, rather than the portion of remote references that incur some latency overhead. Click the legends to the right of the chart to go to the Global Cache Block Transfers vs. Logical Reads by Instance and Physical Reads vs. Logical Reads by Instance pages. From there, you can click an instance legend under the chart to go to the Segment Statistics by Instance page, where you can see which segments are causing cache contention.
The Average Active Sessions chart in the Cluster Database Performance page shows potential problems inside the database. Categories, called wait classes, show how much of the database is waiting for a resource, such as CPU or disk I/O. Comparing CPU time to wait time helps to determine how much of the response time is consumed with useful work rather than waiting for resources that are potentially held by other processes.
The chart displays the load on the instance and identifies performance issues. At the cluster database level, this chart shows the aggregate wait class statistics across all the instances.
Compare the peaks on the Average Active Sessions chart with those on the Database Throughput charts. If the Average Active Sessions chart displays a large number of sessions waiting, indicating internal contention, but throughput is high, then the situation may be acceptable. The database is probably also performing efficiently if internal contention is low but throughput is high. However, if internal contention is high and throughput is low, then consider tuning the database.
If you click the wait class legends beside the Average Active Sessions chart, you can view instance-level information stored in Active Sessions by Instance pages. These pages show the service times for up to four instances. Using these pages, if you need to diagnose and fix problems that are causing the higher number of wait events in a specific category, you can select an instance of interest and view the wait events, as well as the SQL, sessions, services, modules, and actions that are consuming the most database resources.
See Also:Oracle Database 2 Day DBA for more information about tuning a database and instance
The Database Throughput charts summarize any contention that appears in the Average Active Sessions chart, and also show how much work the database is performing on behalf of the user or applications. The Per Second view is for databases that handle SQL queries, shown as Physical Reads in the bottom chart. The Per Transaction view is for databases that handle transactions, shown as Transactions in the top chart and Redo Size in the bottom chart. Logons show how many users are logged on to the database per second.
You can also obtain information at the instance level by clicking one of the legends to the right of the charts to access the Database Throughput by Instance page. This page shows the breakdown for all active instances of the aggregated Database Throughput chart on the Cluster Database Performance page. You can use this page to view the throughput for a particular instance, which may help you diagnose throughput problems.
You can drill down the list farther to see the sessions of an instance consuming the greatest resources. Click an instance name legend just under the chart to go to the Top Sessions page. For more information about this page, refer to the Enterprise Manager Help system.
In the Additional Monitoring Links and Additional Instance Monitoring Links sections of the Cluster Database Performance page, there are links to other charts that are useful in evaluating the performance of your cluster database. This section describes the following charts:
The Top Consumers page provides access to several tabs that enable you to view real-time or collection-based data for the services, modules, clients, and actions that are consuming the most system resources. You access the Top Consumers page by clicking Top Consumers in the Additional Monitoring Links section of the Cluster Database Performance page.
By default, the Top Consumers page initially displays the Overview tab, which shows aggregated summary data for the highest resource consumers. For instance-level information about a consumer, click the portion of a chart representing the consumer or click the link under the chart for that consumer. The page that appears shows the running instances that are serving the consumer. You can expand the names in the Module column to show data for individual instances.
The Top Sessions page shows a real-time summary list of sessions based on aggregated data. You can see which sessions have consumed the greatest amount of system resources, referred to as the top sessions, and then decide whether or not you want to stop the sessions. You access the Top Sessions page by clicking Top Sessions in the Additional Monitoring Links section of the Cluster Database Performance page.
The Instance Activity page enables you to view instance activity for several metrics within general metric categories, such as cursors, transactions, sessions, logical I/O, physical I/O, and net I/O. You can access top sessions statistics for a particular metric by clicking a metric legend under the chart if in Graphic mode, or by clicking a name in the summary table if in Tabular mode. You can also use the Switch Database Instance menu to toggle from one database instance to another. You can view data on a per-second or per-transaction basis. You access this page by clicking Instance Activity in the Additional Instance Monitoring Links section at the bottom of the Cluster Database Performance page.
Collecting and viewing segment-level statistics is an effective method for identifying frequently accessed tables or indexes in a database. The Top Segments page enables you to gather segment-level statistics to identify performance problems associated with individual segments. This page is particularly useful for Oracle RAC, because it also tracks the number of consistent read and current blocks received by an object. A high number of current blocks received plus a high number of buffer waits may indicate potential contention.
You access the Top Segments page by clicking Top Segments in the Additional Monitoring Links section. You can view segments for all instances, or use a filter to see segments for a specific instance.
Use the Database Locks page to determine if multiple instances are holding locks for the same object. The page shows user locks, all database locks, or locks that are blocking other users or applications. You can use this information to stop a session that is unnecessarily locking an object. You access the Database Locks page by clicking Database Locks in the Additional Monitoring Links section.
In Oracle RAC environments, each Automatic Workload Repository (AWR) snapshot captures data from all active instances within the cluster. The data for each snapshot set that is captured for all active instances is from the same point in time. In addition, the data for each instance is stored separately and is identified with an instance identifier. For example, the
buffer_busy_wait statistic shows the number of buffer wait events on each instance. AWR does not store data that is aggregated from across the entire cluster. In other words, the data is stored for each individual instance.
AWR automatically generates snapshots of the performance data once every hour and collects the statistics in the workload repository. You can also manually create snapshots, but this is usually not necessary. The data in the snapshot interval is then analyzed by the Automatic Database Diagnostic Monitor (ADDM).
See Also:Oracle Database Performance Tuning Guide for more information about Automatic Workload Repository
Problems can occur when attempting to complete the installation or database creation process manually instead of using the Oracle Database management tools. Other problems occur due to the database administrator or system administrator missing important operating system or cluster configuration steps prior to installation. Both Oracle Clusterware and Oracle Database components have subcomponents that you can troubleshoot. The crsctl command
check enables you to determine the status of several Oracle Clusterware components at one time.
This section contains the following topics:
You can use
crsctl commands as the
root operating system user to diagnose problems with your Oracle Clusterware installation, or to enable dynamic debugging for Oracle Clusterware. This section contains the following topics:
Use the following command to obtain component names, where
css or the name of the module:
# crsctl lsmodules module_name
For example, viewing the components of the
css module might return the following results:
# crsctl lsmodules css The following are the CSS modules :: CSSD COMMCRS COMMNS
You can enable debugging for the Oracle Cluster daemons, Event Manager (EVM), and their modules by running
crsctl commands as follows, where
component_name is the name of an Oracle Clusterware component for which you want to enable debugging, such as
module is the name of module as it appears in the output for the
crcstl lsmodules command, and
debugging_level is a number from 1 to 5:
# crsctl debug log component module:debugging_level
For example, to enable tracing for the
CSSD module of the
css component, you could use the following command:
# crsctl debug log css CSSD:1
When the Oracle Clusterware daemons are enabled, they start automatically when the node is started. To prevent the daemons from starting automatically, you can disable them using
Run the following command to enable startup for all the Oracle Clusterware daemons:
# crsctl enable crs
Run the following command to disable the startup of all the Oracle Clusterware daemons:
# crsctl disable crs
crsctl enable crsand
crsctl disable crscommands are not supported on Microsoft Windows platforms.
Oracle Clusterware posts alert messages when important events occur. For example, you might see alert messages from the Cluster Ready Services (CRS) daemon process when it starts, if it aborts, if the failover process fails, or if automatic restart of a CRS resource failed.
The location of the Oracle Clusterware log file is
CRS_home is the directory in which Oracle Clusterware was installed and
hostname is the host name of the local node.
You can use
crsctl commands to enable resource debugging using the following syntax, where
resource_name is the name of an Oracle Clusterware resource, such as
debugging_level is a number from 1 to 5:
# crsctl debug log res resource_name:debugging_level
To obtain a list of the resources available for debugging, use the following command:
Note:When you enable debugging for an Oracle Clusterware resource using
crsctlcommands, this has the same effect as if you set the operating system environment variable
USER_ORA_DEBUGto 1 before running the start, stop, or check action scripts for the specified resource.
check command to determine the condition of your clusterware installation, as shown in the following example:
# crsctl check crs
This command displays the status of the Cluster Synchronization Services (CSS), Event Manager (EVM), and the Cluster Ready Services (CRS) daemons. You can also check the status of an individual daemon using the following syntax, where
daemon is one of
# crsctl check daemon
Oracle RAC uses a unified log directory structure to store all the Oracle Clusterware component log files. This consolidated structure simplifies diagnostic information collection and assists during data retrieval and problem analysis.
The log files for the CRS daemon,
crsd, can be found in the following directory:
The log files for the CSS deamon,
cssd, can be found in the following directory:
The log files for the EVM deamon,
evmd, can be found in the following directory:
The log files for the Oracle Cluster Registry (OCR) can be found in the following directory:
The log files for the Oracle RAC high availability component can be found in the following directories:
Note:Each program that is part of the Oracle RAC high availability component has a subdirectory assigned exclusively for that program. The name of the program subdirectory is the same as the name of the program.
If any of the Oracle Clusterware components generates a core dump file, it is located is a subdirectory of the log directory for that component.
diagcollection.pl script as the
root user to collect diagnostic information from an Oracle Clusterware installation. The diagnostics provide additional information so that Oracle Support Services can resolve problems. Run this script from the operating system prompt as follows, where
CRS_home is the home directory of your Oracle Clusterware installation:
# CRS_home/bin/diagcollection.pl --collect
This command displays the status of the Cluster Synchronization Services (CSS), Event Manager (EVM), and the Cluster Ready Services (CRS) daemons.
The Cluster Verification Utility (CVU) can assist you in diagnosing a wide variety of configuration problems. Refer to the example of using the CVU in "Installing Oracle Clusterware 10g".
This section contains the following topics:
You can enable tracing by setting the environment variable
true. After setting this variable to true, run the command that you want to trace. The CVU trace files are created in the
/cv/log directory. Oracle RAC automatically rotates the log files, and the most recently created log file has the name
cvutrace.log.0. You should remove unwanted log files or archive them to reclaim disk space, if needed. The CVU does not generate trace files unless you enable tracing.
Cache Fusion enhances the performance of Oracle RAC by utilizing a high-speed interconnect to send data blocks to another instance's buffer cache. The high-speed interconnect should be a private network with the highest bandwidth to maximize performance.
For network connectivity verification, the CVU discovers all the available network interfaces if you do not specify an interface on the CVU command line.
To verify the accessibility of the cluster nodes from the local node or from any other cluster node, use the component verification command
nodereach as follows:
cluvfy comp nodereach -n node_list [ -srcnode node ] [-verbose]
To verify that the other cluster nodes can be reached from the local node through all the available network interfaces or through specific network interfaces, use the component verification command
nodecon as follows:
cluvfy comp nodecon -n node_list [ -i interface_list ] [-verbose]
You can also use the
nodecon command without the
-i option, as shown in the following example:
cluvfy comp nodecon -n all [-verbose]
When you issue the
nodecom command as shown in the previous example, it instructs the CVU to perform the following tasks:
Discover all the network interfaces that are available on the cluster nodes.
Review the corresponding IP addresses and subnets for the interfaces.
Obtain the list of interfaces that are suitable for use as VIPs and the list of interfaces to private interconnects.
Verify the connectivity among all the nodes through those interfaces.
You can run the
nodecon command in verbose mode to identify the mappings between the interfaces, IP addresses, and subnets. To verify the connectivity among the nodes through specific network interfaces, use the
comp nodecon command with the
-i option. For example, you can verify the connectivity among the nodes
docrac3, through interface
eth0 by running the following command:
cluvfy comp nodecon -n docrac1, docrac2, docrac3 -i eth0 -verbose
If you run the
cluvfy command using the
-verbose argument and the CVU responds with
UNKNOWN for a particular node, then this is because the CVU cannot determine whether a check passed or failed. The cause of this could be because a node is not reachable, or as a result of any system problem that was occurring on that node at the time that the CVU was performing a check.
The following is a list of possible causes for an
The node is down.
Executable files that the CVU requires are missing in the
/bin directory or the
The user account that ran the CVU does not have privileges to run common operating system executable files on the node.
The node is missing an operating system patch or required package.
The kernel parameters on that node were not configured correctly and the CVU cannot obtain the operating system resources required to perform its checks.
See Also:Oracle Database Oracle Clusterware and Oracle Real Application Clusters Installation Guide for Linux, or for a different operating system, for more information about using the CVU to troubleshoot your Oracle Clusterware setup
To verify the existence of node applications, namely the virtual IP (VIP), Oracle Notification Services (ONS), and Global Service Daemon (GSD), on all the nodes, use the CVU
comp nodeapp command, using the following syntax:
cluvfy comp nodeapp [ -n node_list] [-verbose]
To verify the existence of all the Oracle Clusterware components, use the component verification
comp crs command, using the following syntax:
cluvfy comp crs [ -n node_list] [-verbose]
To verify the integrity of the Oracle Cluster Registry, use the component verification
comp ocr command, using the following syntax:
cluvfy comp ocr [ -n node_list] [-verbose]
To verify that all nodes in the cluster have the same view of the cluster configuration, use the component verification
comp clu command, as follows:
cluvfy comp clu
Alert messages are displayed in Enterprise Manager on the Cluster Database Home page under the Alerts heading. The section Related Alerts displays nondatabase alert messages, for example, alert messages for Oracle Net.
The Alerts table is similar to that shown for single-instance databases, but in a cluster database, it includes columns for the target name and target type. For example, if a user connected to the
sales1 instance exceeded his allotted login time, you would see an alert message with the following values:
Target type: Database instance
Name: User logon time
Message: User logon time is 10250 microseconds
Alert triggered: Date and time when the alert condition occurred
The following screenshot shows an example of the Alerts display for a clustered database named
The alert log is associated with an instance. To view the alert log for your cluster database, you must use the Cluster Database Instance page in Enterprise Manager. On the Home page, in the Diagnostic Summary section, click the date string link next to the heading Alert Log to display the alert log entries containing ORA- errors. To view all the entries in the alert log, click Alert Log Content in the Related Links section on the Alert Log Errors page. Enterprise Manager displays the most recent alert log entries by default, but you can specify search criteria to display to alert log entries for a range of dates.
The following screenshot shows an example of the alert log entries for the
docrac1 instance of a cluster database named