| Oracle(R) Enterprise Manager Getting Started with the Oracle Diagnostics Pack Release 2.1 A76915-01 |
|
The Event Management System (EMS) within Oracle Enterprise Manager assists the DBA with automatic problem detection and correction. Using the EMS, the DBA can establish boundary thresholds for warnings and alerts conditions within the network environment for problem monitoring.
The Oracle Enterprise Manager console provides event management functionality with several basis events. To use the Oracle Advanced Event Tests, you must have the licensed Oracle Diagnostics Pack.
The Oracle Advanced Event Tests for the database, listener, and node service types are grouped into the following categories:
This category of event tests monitors for severe problems that require immediate action.
This event test signifies that the database being monitored has generated errors to the ALERT log file since the last sample time. The ALERT log file is a special trace file containing a chronological log of messages and errors. An alert event is triggered when Oracle Exception (ORA-006xx), deadlock detected (ORA-00060), or data block corrupted (ORA-01578) messages are written to the ALERT log file. A warning is displayed when other ORA messages are written to the ALERT log file.
None
Alert log error messages since last sample time.
60 seconds
Examine ALERT log for additional information.
This event test signifies that the archiver of the database being monitored has been temporarily suspended since the last sample time.
If the database is running in ARCHIVELOG mode, an alert is displayed when archiving is hung (ORA-00257) messages are written to the ALERT file. The ALERT file is a special trace file containing a chronological log of messages and errors.
If the database is not running in ARCHIVELOG mode, this event will not register.
None
ALERT log error messages since last sample time.
60 seconds
Examine ALERT log and archiver trace file for additional information; however, the most likely cause of this message is that the destination device is out of space to store the redo log file. Verify the device specified in the initialization parameter ARCHIVE_LOG_DEST is set up properly for archiving.
The Oracle server job queue is a database table that stores information about local jobs such as the PL/SQL call to execute for a job such as when to run a job. Database replication is also managed by using the Oracle job queue mechanism using jobs to push deferred transactions to remote master sites, to purge applied transactions from the deferred transaction queue or to refresh snapshot refresh groups.
A job can be broken in two ways:
This event test checks for broken DBMS jobs. An alert is generated if the number of broken jobs exceeds the value specified by the threshold argument.
Alert threshold: Threshold for alert (number of jobs). Default is 0 jobs.
Job identifiers of broken DBMS jobs.
60 seconds
Check the ALERT log and trace files for error information. Correct the problem that is preventing the job from running. Force immediate re-execution of the job by calling DBMS_JOB.RUN, if desired.
This event test signifies that the database being monitored has generated a corrupted block error to the ALERT file since the last sample time. The ALERT file is a special trace file containing a chronological log of messages and errors. An alert event is triggered when data block corrupted (ORA-01578) messages are written to the ALERT file.
None
Segment name containing the data block specified by the file ID and block number in the ORA-01578 message.
60 seconds
Examine the ALERT log for additional information.
This event test checks whether the database being monitored is running. If this test is triggered, other database events are ignored.
None
The Startup Database job task can be set up as a fixit job for automatically correcting the problem.
Oracle uses deferred transactions to propagate data-level changes asynchronously among master sites in an advanced replication system as well as from an updatable snapshot to its master table.
This event test checks for the number of deferred transactions. An alert is generated if the number of deferred transactions exceeds the value specified by the threshold argument.
Threshold for alert (number of transactions). Default is 100 transactions.
Number of deferred transactions.
60 seconds
When the advanced replication facility pushes a deferred transaction to a remote site, it uses a distributed transaction to ensure that the transaction has been properly committed at the remote site before the transaction is removed for the queue at the local site. If transactions are not being pushed to a given remote site, verify that the destination for the transaction was correctly specified. If you specify a destination database when calling DBMS_DEFER_SYS.SCHEDULE_EXECUTION using the DBLINK parameter or DBMS_DEFER_SYS.EXECUTE using the DESTINATION parameter, make sure the full database link is provided.
Wrong view destinations can lead to erroneous deferred transaction behavior. Verify the DEFCALLEST and DEFTRANDEST views are the definitions from the CATREPC.SQL and not from the CATDEFER.SQL.
Oracle uses deferred transactions to propagate data-level changes asynchronously among master sites in an advanced replication system as well as from an updatable snapshot to its master table. If a transaction is not successfully propagated to the remote site, Oracle rolls back the transaction, logs the transaction in the SYS.DEFERROR view in the remote destination database.
This event test checks for the number of transactions in SYS.DEFERROR view and raises an alert if it exceeds the value specified by the threshold argument.
Threshold for alert (number of error transactions). Default is 0 transactions.
Number of transactions that could not be applied
60 seconds
An error in applying a deferred transaction may be the result of a database problem, such as a lack of available space in the table that is to be updated or may be the result of an unresolved insert, update, or delete conflict. The SYS.DEFERROR view provides the ID of the transaction that could not be applied. Use this ID to locate the queued calls associated with the transaction. These calls are stored in the SYS.DEFCALL view. You can use the procedures in the DBMS_DEFER_QUERY package to determine the arguments to the procedures listed in the SYS.DEFCALL view.
The Oracle server job queue is a database table that stores information about local jobs such as the PL/SQL call to execute for a job such as when to run a job. Database replication is also managed by using the Oracle job queue mechanism using jobs to push deferred transactions to remote master sites, to purge applied transactions from the deferred transaction queue or to refresh snapshot refresh groups.
If a job returns an error while Oracle is attempting to execute it, the job fails. Oracle repeatedly tries to execute the job doubling the interval of each attempt. If the job fails sixteen times, Oracle automatically marks the job as broken and no longer tries to execute it.
This event test checks for failed DBMS jobs. An alert is generated if the number of failed jobs exceeds the value specified by the threshold argument.
Alert threshold: Threshold for alert (number of jobs). Default is 0 jobs.
Job identifiers of failed DBMS jobs.
Number of failures since last successful execution.
60 seconds
Check the ALERT log and trace files for error information. Correct the problem that is preventing the job from running.
This event test checks whether a new connection can be established to a database. If the maximum number of sessions is exceeded or the listener is down, this test is triggered.
None
None
60 seconds. However, a frequency of 10 minutes is recommended.
Check the status of the listener to make sure it is running on the node where the event was triggered. If the listener is running, check to see if the number of sessions is at the session limit.
This event test signifies that a session terminated unexpectedly since the last sample time. The ALERT file is a special trace file containing a chronological log of messages and errors. An alert is displayed when session unexpectedly terminated (ORA-00603) messages are written to the ALERT file.
None
ALERT log error messages since the last sample time.
60 seconds
Examine the ALERT log and the session trace file for additional information.
The Oracle server job queue is a database table that stores information about local jobs.
This event test checks for unscheduled DBMS jobs. An alert is generated when the number of jobs, whose execution time has exceeded the value specified by the Job Completion Time argument, exceeds the value specified in the Alert Threshold. A job's completion date/time is calculated using the NEXT_DATE value in the SYS.DBA_JOBS view plus the approximate time it takes to complete a job as specified by the Job completion time argument.
Job identifiers of jobs that are not rescheduled for execution.
60 seconds
Check the FAILURES and BROKEN values in the SYS.DBA_JOBS view.
If the job failed to execute, check the ALERT log and trace files for error information and fix the error.
If the job was never executed, there may be a problem with the availability of SNP background processes. Check the initialization parameter JOB_QUEUE_PROCESSES to determine the maximum number of background processes available and JOB_QUEUE_INTERVAL to determine how frequently each background process wakes up.
This event test signifies that a database user is blocking at least one other user from performing an action, such as updating a table. An alert is generated if the number of consecutive blocking occurrences reaches the specified value.
Number of occurrences: Number of consecutive occurrences a user can be blocked before an alert is generated. Default is three.
Session ID of the user who is blocking other users.
60 seconds
Either have the user, who is blocking other users, rollback the transaction, or wait until the blocking transaction has been committed.
This category of event tests tracks possible space problems within the database.
The ALERT file is a special trace file containing chronological log of messages and errors. Oracle always appends to the file. To control the size of an ALERT file you must manually delete the file when you no longer need it.
This event test checks for file size of the ALERT file. If the file is greater than the values specified in the threshold arguments, then a warning or alert is generated.
Current size of ALERT file in kilobytes.
60 seconds. However, a frequency of 10 minutes is recommended.
Delete the ALERT file to recover disk space.
When running a database in ARCHIVELOG mode, the archiving of the online redo log is enabled. Filled groups of the online redo log are archived, by default, to the destination specified by the LOG_ARCHIVE_DEST initialization parameter. If this destination device becomes full, the database operation is temporarily suspended until disk space is available.
If the database is running in ARCHIVELOG mode, this test checks for available redo log destination devices. If the space available is less than the threshold value given in the threshold arguments, then an alert or warning is generated.
If the database is not running in ARCHIVELOG mode, or all archive destinations are standby databases for Oracle 8i, this test fails to register.
Space available on destination drive in kilobytes.
60 seconds. However, a frequency of 10 minutes is recommended.
Verify the device specified in the initialization parameter LOG_ARCHIVE_DEST is set up properly for archiving.
The Archive Full (%) event test monitors the same destination device as the Archive Full event test. The Archive Full (%) event test, however, returns the percentage of free space remaining on the log destination.
If the space available is less than the threshold value given in the threshold arguments, then an alert or warning is generated.
If the database is not running in ARCHIVELOG mode or all arachive destinations are standby databases for Oracle8i, this test fails to register.
Percentage of space available on destination drive.
60 seconds. However, a frequency of 10 minutes is recommended.
Verify the device specified in the initialization parameter LOG_ARCHIVE_DEST is set up properly for archiving.
The Oracle Server allocates space for segments in units of one extent. When the existing extents of a segment are full, the Oracle Server allocates another extent for that segment. In order to do so, Oracle searches through the free space in the tablespace containing the segment for the first free, contiguous set of data blocks sufficient to meet the required extent's size. If sufficient space is not found, an error is returned by the Oracle Server.
This event test checks for the largest chunk of free space in the tablespace specified by the Tablespace name, Segment name, and Segment type parameters. If any table, index, cluster or rollback segments within the tablespace cannot allocate the additional number of extents specified in the thresholds, then a warning or alert is generated.
If the largest chunk of free space in the specified tablespace can only contain 2 extents, then 2 is compared to the threshold values. If 3 is specified for an alert, the alert test is triggered because 3 extents cannot be allocated in the tablespace.
Segment name where the additional extents cannot be allocated.
Tablespace name containing the segment.
Maximum size of contiguous free space in kilobytes for the tablespace.
60 seconds. However, a frequency of 10 minutes is recommended.
Increase the size of the tablespace by enabling automatic extension for one of its existing datafiles, manually resizing one of its existing datafiles or adding a new datafile.
Or if the tablespace is suffering from tablespace free space fragmentation problems, consider reorganizing the entire tablespace by dropping and recreating all segments within that tablespace. When reorganizing a tablespace, consider making the extents to be sized as integral divisors of the usable size of the datafiles in which they reside. Try to limit the extent sizes used in the tablespace to be no more than 2 or 3 different extent sizes. Ensure extents within a segment are the same size or a multiple of each other by specifying STORAGE parameters where NEXT=INITIAL and PCTINCREASE=0. For segments that are linearly scanned, choose an extent size that is a multiple of the number of blocks read during each multiblock read.
Each server and background process can write to an associated trace file in order to log messages and errors. Background processes and the ALERT file are written to the destination specified by BACKGROUND_DUMP_DEST.
Trace files for server processes are written to the destination specified by USER_DUMP_DEST.
This event test checks for available free space on these dump destination devices. If the space available is less than the threshold value given in the threshold arguments, then an alert or warning is generated.
Dump destination device and space available in kilobytes.
60 seconds. However, a frequency of 10 minutes is recommended.
Verify the device specified in the initialization parameters BACKGROUND_DUMP_DEST and USER_DUMP_DEST is set up properly for tracing.
This event test monitors the same dump destinations as the Dump Full event test. The Dump Full (%) event test, however, returns the percentage of free space remaining on the dump destinations.
If the space available is less than the threshold value given in the threshold arguments, then an alert or warning is generated.
Dump destination device and percentage of space available.
60 seconds. However, a frequency of 10 minutes is recommended.
Verify the device specified in the initialization parameters BACKGROUND_DUMP_DEST and USER_DUMP_DEST is set up properly for tracing.
A segment collection is a group of extents that make up a single table, index, temporary or rollback segment. The Oracle Server offers a practical method of space allocation to segments as they are required to grow. Oracle allows a segment to have multiple extents, which the server allocates automatically when they are needed. For any segment that grows continuously, it is important to carefully monitor that segment's growth pattern. Storage values for the database should be chosen to ensure new extents are not frequently allocated.
This event test checks whether any of the segments specified by the Tablespace name, Segment name, and Segment type parameters are allocating extents too quickly. If, for any segment, the number of extents allocated since the last event check is greater than the threshold values specified in the threshold arguments, then a warning or alert is generated.
Name of segment growing too quickly.
Tablespace name containing segment.
Number of extents segment grew since last event condition check.
60 seconds. However, a frequency of 1 day is recommended.
Consider increasing the value of the segment's NEXT storage parameter value so that extents are allocated less frequently.
A segment is a collection of extents that make up a single table, cluster, index, temporary or rollback segment. The MAXEXTENTS segment storage parameter specifies the maximum number of extents that can be allocated to the segment. Once a segment has filled the maximum number of extents, any row insertion will fail with an ORA-01631 error message.
This event test checks whether any of the segments specified by the Tablespace name, Segment name, and the Segment type parameters are approaching their maximum extents. If for any segment the maximum number of extents minus the number of existing extents is less than the threshold values specified in the threshold arguments, then a warning or alert is generated.
If the maximum number of extents for a segment is 20 and the number of existing extents is 16, then 4 is compared against the specified threshold values. If 3 is specified for an alert and 5 is specified for a warning, the warning event is triggered because only 4 extents are available.
Name of segment approaching its maximum extents.
Name of tablespace containing segment.
Number of extents that can be allocated before the maximum number of extents is hit.
60 seconds. However, a frequency of 10 minutes is recommended.
If possible, increase the value of the segment's MAXEXTENTS storage parameter. Otherwise, rebuild the segment with a larger extent size ensuring the extents within a segment are the same size by specifying STORAGE parameters where NEXT=INITIAL and PCTINCREASE=0. For segments that are linearly scanned, choose an extent size that is a multiple of the number of blocks read during each multiblock read. This will ensure that the Oracle multiblock read capability is used efficiently.
A segment is a collection of extents that make up a single table, cluster, index, temporary or rollback segment. The Oracle Server allows a segment to have multiple extents, which the server allocates automatically when additional space is required.
There is no performance degradation for a segment having multiple extents that are never fully-scanned (table and temporary segments only) where the extents are the same size and are also an integral multiple of the multiblock read batch size. No performance degradation is found where extents are 100 or more times larger than the read batch size. Oracle administrators may, however, choose to monitor the number of extents in a segment.
This event test checks whether any of the segments specified by the Tablespace name, Segment name, and Segment type parameters have multiple extents. If the number of extents is greater than the threshold values specified in the threshold arguments, then a warning or alert is generated.
Name of segment comprised of multiple extents.
Tablespace name containing segment.
Number of extents currently allocated for the segment.
60 seconds. However, a frequency of 10 minutes is recommended.
If the segment may be linearly scanned, make sure the multiple extents are the same size. The chosen extent size is an integral multiple of the multiblock read batch size or the extent size is 100 or more times larger than the read batch size in order to achieve the highest efficiency of the server's multiblock read capability.
For all other segments, no action is required unless the number of extent allocations is approaching the segment's maximum number of extents. In this case, increase the value of the segment's MAXEXTENTS storage parameter if possible.
Otherwise, rebuild the segment with a larger extent size ensuring that the extents within a segment are the same size by specifying STORAGE parameters where NEXT=INITIAL and PCTINCREASE=0.
A master table's snapshot log keeps track of fast refresh data for all corresponding snapshots. When a snapshot log is created for a master table, Oracle creates an underlying table to support the snapshot log. Oracle automatically tracks which rows in a snapshot log have been used during the refreshes of snapshots, and purges those rows from the log. Oracle does not delete rows from the log until all snapshots have used them. As a result, in certain situations, a snapshot log can grow indefinitely when multiple snapshots are based on the same master table. It is best to always try to keep a snapshot log as small as possible to minimize the database space that it uses.
This event test checks whether a snapshot log is too large. In order to do this, the test determines the number of snapshot log tables containing more rows than specified by the snapshot log's table size parameter. If this number is greater than the threshold value specified in the threshold argument, then an alert is generated.
Snapshot log table and its size.
60 seconds
To remove rows from a snapshot log and free up space for newer log records, you can refresh the snapshots associated with the log so that Oracle can purge rows from the snapshot log or manually purge records in the log by deleting the rows required only by the nth least recently refreshed snapshots.
To manually purge rows from a snapshot log, execute the PURGE_LOG stored procedure of the DBMS_SNAPSHOT package at the database that contains the log.
As segments within a tablespace grow, the free space within that tablespace decreases. Should free space become insufficient, the creation of new segments or the extension of existing segments will fail.
This event test checks for the total free space in the tablespace specified by the Tablespace name. If the percentage of used space is greater than the values specified in the threshold arguments, then a warning or alert is generated.
60 seconds
Increase the size of the tablespace by enabling automatic extension for one of its datafiles, manually resizing one of its datafiles, or adding a new datafile.
This category of event tests tracks possible resource problems within the database environment.
The DB_FILES initialization parameter specifies the maximum number of database files that can be opened for this database.
This event test checks for the utilization of the datafile resource against the values (percentages) specified by the threshold arguments. If the percentage of datafiles currently used to the limit set in the DB_FILES initialization parameter exceeds the values specified in the threshold arguments, then a warning or alert is generated.
If 30 datafiles are used and the value of DB_FILES is 40, the percentage is 75% (30/40 x 100). This value is compared against the specified thresholds.
Current value and the limit specified by DB_FILES.
Verify the current number of data files in use by the database. Increase the DB_FILES instance parameter, if the current value for DB_FILES is less than MAXDATAFILES.
The DML_LOCKS initialization parameter specifies the maximum number of DML locks. The purpose of DML locks is to guarantee the integrity of data being accessed concurrently by multiple users. DML locks prevent destructive interference of simultaneous conflicting DML and/or DDL operations.
This event test checks for the utilization of the lock resource against the values (percentage) specified by the threshold arguments. If the percentage of all active DML locks to the limit set in the DML_LOCKS initialization parameter exceeds the values specified in the threshold arguments, then a warning or alert is generated.
If DML_LOCKS is 0, this event fails to register. A value of 0 indicates that enqueues are disabled.
If 40 DML locks are active and the value of DML_LOCKS is 60, the percentage is 67% (40/60 x 100). This value is compared against the specified thresholds.
Current value and the limit specified by DML_LOCKS.
60 seconds
Increase the DML_LOCKS instance parameter by 10%.
The PROCESSES initialization parameter specifies the maximum number of operating system user processes that can simultaneously connect to a database at the same time. This number also includes background processes utilized by the instance.
This event test checks for the utilization of the process resource against the values (percentage) specified by the threshold arguments. If the percentage of all current processes to the limit set in the PROCESSES initialization parameter exceeds the values specified in the threshold arguments, then a warning or alert is generated.
If 40 processes are currently connected and the value of PROCESSES is 50, the percentage is 80% (40/50 x 100). This value is compared against the specified thresholds.
Current value and the limit specified by PROCESSES.
60 seconds
Verify that the current PROCESSES instance parameter setting has not exceeded the operating system-dependent maximum. Increase the number of processes to be at least 6 + the maximum number of concurrent users expected to log in to the instance.
The SESSIONS initialization parameter specifies the maximum number of concurrent connections that the database will allow.
This event test checks for the utilization of the session resource against the values (percentage) specified by the threshold arguments. If the percentage of the number of sessions, including background processes, to the limit set in the SESSIONS initialization parameter exceeds the values specified in the threshold arguments, then a warning or alert is generated.
If there are 20 sessions and the value of SESSIONS is 25, the percentage is 80% (20/25 x 100). This value is compared against the specified thresholds.
Current value and the limit specified by SESSIONS.
60 seconds
Increase the SESSIONS instance parameter. For XA (transaction application) environments, confirm that SESSIONS is at least 2.73 * PROCESSES. For MTS (multi-thread server) environments, confirm that SESSIONS is at least 1.1 * maximum number of connections.
The LICENSE_MAX_SESSIONS initialization parameter specifies the maximum number of concurrent user sessions allowed simultaneously.
This event test checks whether the number of users logged on is reaching the license limit. If the percentage of the number of concurrent user sessions to the limit set in the LICENSE_MAX_SESSIONS initialization parameter exceeds the values specified in the threshold arguments, then a warning or alert is generated. If LICENSE_MAX_SESSIONS is not explicitly set to a value, the event does not trigger.
If there are 15 concurrent user sessions and the value of LICENSE_MAX_SESSIONS is 20, the percentage is 75% (15/20 x 100). This value is compared against the specified thresholds.
Current value and the limit specified by SESSIONS.
60 seconds
This typically indicates that the license limit for the database has been reached. The user will need to acquire additional licenses, then increase LICENSE_MAX_SESSIONS to reflect the new value.
This category of event tests monitors the system for performance problems.
The data block buffer cache efficiency, as measured by the hit ratio, records the percentage of times the data block requested by the query is in memory.
Effective use of the buffer cache can greatly reduce the I/O load on the database. If the buffer cache is too small, frequently accessed data will be flushed from the buffer cache too quickly which forces the information to be re-fetched from disk. Since disk access is much slower than memory access, application performance will suffer. In addition, the extra burden imposed on the I/O subsystem could introduce a bottleneck at one or more devices which would further degrade performance.
This event test monitors the buffer cache hit ratio (percentage of success) against the values specified by the threshold arguments. If the number of occurrences is smaller than the values specified, then a warning or alert is generated.
Current ratio.
60 seconds
The DB_BLOCK_BUFFERS initialization parameter determines the number of database buffers available in the buffer cache. It is one of the primary parameters which contribute to the total memory requirements of the SGA on the instance. The DB_BLOCK_BUFFERS parameter, together with the DB_BLOCK_SIZE parameter, controls the total size of the buffer cache. Since DB_BLOCK_SIZE can only be specified when the database is first created, normally the size of the buffer cache size is controlled using the DB_BLOCK_BUFFERS parameter.
Consider increasing the DB_BLOCK_BUFFERS initialization parameter in order to increase the size of the buffer cache. This increase allows the Oracle Server to keep more information in memory, thus reducing the number of I/O operations required to do an equivalent amount of work using the current cache size.
In two circumstances the data for a row in a table may be too large to fit into a single data block. This results in row fragmentation.
In the first case, the row is too large to fit into one data block when it is first inserted. In this case, the Oracle Server stores the data for the row in a chain of data blocks reserved for that segment. Row chaining (or continuation) most often occurs with large rows, such as rows that contain a column of datatype LONG or LONG RAW. Row continuation in these cases is unavoidable without using a larger DB_BLOCK_SIZE.
In the second case, however, a row that originally fit into one data block is updated so that the overall row length increases and the block's free space is already completely filled. In this case, Oracle migrates the data for the entire row to a new data block, assuming the entire row can fit into a new block. Oracle preserves the original row piece of a migrated row to point to the new block containing the migrated row.
When a row is continued or migrated, I/O performance associated with this row decreases because Oracle must scan more than one data block to retrieve the information for the row.
This event test monitors whether continued rows are found in the segments specified by the Segment name, Segment owner, and Segment type parameters. If continued rows are found, an alert is generated.
Names of segments containing continued or migrated rows.
60 seconds. However, a frequency of 1 day is recommended.
If a segment containing fragmented rows has been detected, there are two ways to solve the problem. If rows are not likely to continue growing, rebuild the table. Row fragmentation is eliminated as rows are tightly packed into each database block during re-creation.
If rows are likely to continue growing through updates, consider increasing the segment's PCTFREE value to reduce the likelihood of future row fragmentation.
The shared pool is an area in the SGA that contains the library cache of shared SQL requests, the dictionary cache and the other cache structures that are specific to a particular instance configuration.
The dictionary cache efficiency, as measured by the miss ratio, records the percentage of times the dictionary data was not already in memory.
The shared pool mechanism can greatly reduce system resource consumption in at least three ways:
If the shared pool is too small, users will consume additional resources to complete a database operation. For dictionary cache access, the overhead is primarily the additional I/O since the dictionary cache references that have been displaced from the cache will need to be re-fetched from disk.
This event test monitors the data dictionary cache miss ratio (percentage of failures) against the values specified by the threshold arguments. If the number of occurrences exceeds the values specified, then a warning or alert is generated.
Current ratio.
60 seconds
The SHARED_POOL_SIZE initialization parameters controls the total size of the shared pool. Consider increasing SHARED_POOL_SIZE in order to decrease the frequency in which dictionary data is being flushed from the shared pool in order to make room for new data.
This event test monitors the real time database physical I/O rate (requests/seconds) against the values specified by the threshold arguments. If the disk I/O rate exceeds the threshold values entered for the specified number of occurrences, then a warning or alert is generated.
Current rate in requests/second.
60 seconds
Determine whether the I/O rate is having a negative impact to performance by investigating the disk queue lengths for high I/O devices. It may be necessary to move data files around to balance any identified bottlenecks. Other tuning efforts such as adjusting indexes to reduce the number of full table scans can also reduce I/O load.
If no bottlenecks are evident, increase the I/O rate threshold values.
Database writer process (DBWR) bottlenecks can be detected by monitoring occurrences of the free buffer waits test over time. If the database environment is in a steady state, there should not be any free buffer waits. However, an occasional absolute increase in free buffer waits is not a problem. Only consistent occurrences of an increase should be of concern.
As a result, this test maintains a history of free buffer waits samples, specified by the Number of Samples parameter, and monitors for a percentage of these samples where an increase was detected. This percentage is then compared against the values specified by the threshold arguments. If the percentage of samples (where an increase in free buffer waits is detected) is larger than the threshold arguments, then an alert or warning is generated.
Example: If 10 has been specified for the number of samples, then during the first 9 times the test condition is checked, the test is merely building up the history of free buffer waits samples. On the 10 interval and from that point on, the test monitors how many of those samples showed and increase in free buffer waits. Assume 2 samples showed an increase, then the percentage of samples showing an increase is 20%.
60 seconds
When users are having to wait for free buffers, then either DB_FILE_SIMULTANEOUS_WRITES needs to be increased or the number of DBWR processes needs to be increased.
The DB_FILE_SIMULTANEOUS_WRITES initialization parameter determines the number of simultaneous writes to each database file when written by DBWR. This parameter is also used to determine the number of reads per file in the redo read ahead when reading redo during recover. This parameter impacts the number of simultaneous I/Os, not just the number of simultaneous writes.
Consider increasing the DB_FILE_SIMULTANEOUS_WRITES initialization parameter in order to increase the speed at which the DBWR writes dirty buffers which then decreases the number of times sessions needed to wait for free buffers.
The DB_WRITES initialization parameter controls the number of DBWR processes that are activated at instance startup. It is a platform specific parameter which is used to avoid DBWR bottlenecks on operating systems which do not support asynchronous I/O. The DBWR process is responsible for writing dirty buffers in batches from the buffer cache back to the datafiles.
DBWR bottlenecks are most likely on systems which have a high insert, update or delete rate and a large number of disk devices. Since database writes are not serial, there can be benefit to having multiple DBWR processes, even in a single CPU database environment.
When an indexed value is updated in the table, the old value is deleted from the index and the new value is inserted into a separate part of the index. The space released by the old value can never be used again. As indexed values are updated or deleted, the amount of unusable space within the index increases, a condition called index stagnation. Because a stagnated index contains a mixture of data and empty areas, scans of the index will be less efficient.
This event test monitors whether indexes specified by the Index name, Index owner, Indexed object name, and Indexed object owner name parameters suffer from index stagnation. If an index has stagnation, an alert is generated.
Name of index where index stagnation is detected.
60 seconds. However, a frequency of 1 day is recommended.
Consider rebuilding the index to enhance performance. An index rebuild can be accomplished by using either the ALTER INDEX REBUILD statement or the CREATE INDEX statement.
The sort efficiency is measured by the percentage of times sorts were performed in memory as opposed to going to disk.
For best performance, most sorts should occur in memory as sorts to disks are expensive to perform. If the sort area is too small, extra sort runs will be required during the sort operation. This increases CPU and I/O resource consumption.
This event test monitors the in memory sort ratio. The ratio equals the number of sorts performed in memory divided by the total number of sorts performed. If the number of occurrences is smaller that the values specified, then a warning or alert is generated.
60 seconds
Consider increasing the SORT_AREA_SIZE initialization parameter in order to increase the size of the sort area which will allow the Oracle Server to keep sorts in memory, reducing the number of I/O operations required to do an equivalent amount of work using the current sort area size.
The shared pool is an area in the SGA that contains the library cache of shared SQL requests, the dictionary cache and the other cache structures that are specific to a particular instance configuration.
The library cache efficiency, as measured by the miss ratio, records the percentage of times the fully parsed or compiled representation of PL/SQL blocks and SQL statements are not already in memory.
The shared pool mechanism can greatly reduce system resource consumption in at least three ways:
If the shared pool is too small, users will consume additional resources to complete a database operation. For library cache access, the overhead is primarily the additional CPU resources required to re-parse the SQL statement.
This event test monitors the library cache miss ratio (percentage of failures) against the values specified by the threshold arguments. If the number of occurrences exceeds the values specified, then a warning or alert is generated.
Current ratio.
60 seconds
The SHARED_POOL_SIZE initialization parameter controls the total size of the shared pool. Consider increasing the SHARED_POOL_SIZE in order to decrease the frequency in which SQL requests are being flushed from the shared pool in order to make room for new requests.
To take advantage of the additional memory available for shared SQL areas, you may also need to increase the number of cursors permitted per session. You can increase this limit by increasing the value of the initialization parameter OPEN_CURSORS.
Also examine SQL statements which can be modified to optimize shared SQL pool memory use and avoid unnecessary statement reparsing. This type of problem is commonly caused when similar SQL statements are written which differ in space, case, or some combination of the two. You may also consider using bind variables rather than explicitly specified constants in your statements whenever possible.
This event test monitors the network I/O rate (bytes/seconds) against the values specified by the threshold arguments. If the network I/O rate exceeds the threshold values entered for the specified number of occurrences, then a warning or alert is generated.
Current rate in bytes/seconds.
60 seconds
Investigate whether the current I/O rate condition is having a negative performance impact on the applications. If it is determined that the excessive network traffic is a performance problem, several approaches can be used to eliminate the bottleneck. Either the network bandwidth can be increased by introducing faster hardware, or applications can be redesigned to take advantage of features such as array fetch, which minimize network traffic.
Redo log entries contain a record of changes that have been made to the database block buffers. The log writer (LGWR) process writes redo log entries from the log buffer to a redo log file. The log buffer should be sized so that space is available in the log buffer for new entries, even when access to the redo log is heavy. When the log buffer is undersized, user process will be delayed as they wait for the LGWR to free space in the redo log buffer.
The redo log buffer efficiency, as measured by the hit ratio, records the percentage of times users did not have to wait for the log writer to free space in the redo log buffer.
This event test monitors the redo log buffer hit ratio (percentage of success) against the values specified by the threshold arguments. If the number of occurrences is smaller than the values specified, then a warning or alert is generated.
60 seconds
The LOG_BUFFER initialization parameter determines the amount of memory that is used when buffering redo entries to the redo log file.
Consider increasing the LOG_BUFFER initialization parameter in order to increase the size of the redo log buffer. Redo log entries contain a record of the changes that have been made to the database block buffers. The log writer process (LGWR) writes redo log entries from the log buffer to a redo log. The redo log buffer should be sized so space is available in the log buffer for new entries, even when access to the redo log is heavy.
Rollback segments are portions of the database that record the actions of transactions in case a transaction is rolled back. Rollback segments are used to provide read consistency, support rollback transactions and recover a database.
Proper allocation of rollback segments make for optimal database performance. Using a sufficient number of rollback segments distributes rollback segment contention across many segments and improves performance.
Contention for rollback segments is reflected by contention for buffers that contain rollback segment blocks.
This event test monitors rollback segment hit ratio (percentage) against the values specified by the threshold arguments. If the rollback segment hit ratio is greater than the values specified, then a warning or alert is generated.
60 seconds
To reduce contention for buffers containing rollback segment blocks, create additional rollback segments. The general guidelines for choosing how many rollback segments to allocate is based on the number and type of concurrently active transactions on your database. Allocate one rollback segment for each 10 concurrent OLTP (online transaction processing) transactions and one rollback segment for each concurrent batch job.
In addition, when creating a rollback segment, keep extents within a rollback the same size by specifying STORAGE parameters where NEXT=INITIAL.
It is also recommended that you set the MINEXTENTS parameter value to 20. Because rollback segments are logically regarded as a circular queue of extents, they are required to have MINEXTENTS value of at least 2. The probability that a rollback segment will require a new extent allocation depends on how likely the next extents are to contain active undo. The more extents the rollback segment has, the less likely that a rollback segment will require an extent allocation that could be avoided. Administrators should create rollback segments with many extents. Naturally, there is a point of diminishing returns. There is a rapid decline of a rollback segment's probability of extending as the number of extents increases. It has been determined that beyond 20 extents, the incremental decrease in the segment's probability of extending fails to justify the cost of the additional extent.
You can monitor any system statistic available in the database with this test. An alert or warning will be generated if the value of the selected V$SYSSTAT parameter exceeds the values specified by the threshold arguments.
To view the V$SYSSTAT parameter names and values, connect to the database with SQL Worksheet and execute SELECT NAME, VALUE FROM V$SYSSTAT.
Parameter name and current value.
60 seconds
The user action for the event is dependent on the statistic that is being monitored.
You can monitor any system statistic available in the database with this test. The threshold values are compared to the difference between the last sample point and the current sample point of the V$SYSSTAT parameter. An alert or warning is generated if the calculated difference exceeds the values specified by the threshold arguments.
To view the V$SYSSTAT parameter names and values, connect to the database with SQL Worksheet and execute SELECT NAME, VALUE FROM V$SYSSTAT.
Parameter name and change in parameter's value.
60 seconds
The user for the event is dependent upon the statistic that is being monitored.
This category of event test allows you to monitor specific database user connections.
This event test monitors specified database user connections. For example, an alert is displayed when a particular database user connection, specified by the User name filter argument, has been detected.
User Name filter: Filter the user names to be monitored, or * for all users. Default is ='SYS'.
60 seconds. However, a frequency of 5 minutes is recommended.
User actions may vary depending on the user connection detected.
This category of event tests monitors for severe conditions on the system. Immediate action needs to be taken by the administrator.
This event test signifies that the Intelligent Agent data gathering service has generated errors to the Data Gatherer alert file since the last sample time. The Data Gatherer alert file is a special trace file containing a chronological log of messages and errors. Note that the Data Gatherer alert log file is different than the Database alert log file. An alert is displayed when Data Gatherer (ODG-xxxxx) messages are written to the Data Gatherer alert file.
None
Alert log error messages since last sample time.
60 seconds
Examine Data Gatherer alert log for additional information.
This event test checks whether the Intelligent Agent data gathering service being monitored is running. If the Intelligent Agent data gathering service is down, this event is triggered.
None
None
60 seconds
Restart the Intelligent Agent data gathering service.
This event test checks whether the agent on a node can be accessed from the Console. If the Intelligent Agent is down, this test is triggered.
None
None
60 seconds
Restart the Intelligent Agent.
This category of event tests tracks possible space problems.
This event test checks for available space on the disk specified by the Disk name parameter, such as c: (Windows) or /tmp (UNIX). If the space available is less than the values specified by the thresholds, then a warning or alert is generated.
Disk name and space available in kilobytes on the disk.
This event test monitors the same file systems as the Disk Full event test. The Disk Full (%) event test, however, returns the percentage of space remaining on the disk destinations.
Disk name and percentage of space available on the disk.
This event test checks for available swap space. If the space available falls below the values specified in the threshold arguments, then a warning or alert is generated.
Percentage of available space.
This category of event tests monitors the system for performance problems.
This event test checks for the CPU utilization (percentage used) against the threshold values specified by the threshold arguments. If the number of occurrences exceeds the values specified, then a warning or alert is generated.
Current utilization.
This event test checks the paging rate (kilobytes/second paged in/out) against the threshold values specified by the threshold arguments. If the number of occurrences exceeds the values specified, then a warning or alert is generated.
Current rate.
This category of event test monitors for catastrophic conditions on the system. Immediate action needs to be taken by the administrator.
This event test checks whether the listener on the node being monitored is available.
None
The Startup Listener job task can be set up as a fixit job for automatically correcting the problem. To avoid the fixit job executing when the listener is brought down intentionally, turn off the fixit job option.
The Event Management System provides paging services which notify an administrator with a pager when an event has occurred. Alphanumeric pagers provide a brief text message identifying the event. Numeric pagers provide the numeric pager event IDs to identify the event.
The User-Defined SQL test allows you to define your own SQL script that evaluates an event test. You can add SQL-based events to the Event Library for future use and/or editing as you would with any other event. Within the SQL script, you provide a datapoint that is used as a return value for comparison with the Alert and Warning Threshold values.
You specify the operator, alert threshold, warning threshold, number of occurrences, and SQL script. The specified operator is used to compare the return value from the script with the supplied thresholds to determine the event condition. The event is triggered after the specified number of occurrences is met.
Value returned by the SQL script
60 seconds
The action depends on the SQL script and hence Oracle cannot make any recommendations.