6 Monitoring Exadata

6.1 Introducing Exadata Monitoring Tools and Information Sources

You can monitor Exadata system performance by using various tools and information sources. The following topics introduce and compare the main Exadata performance monitoring tools and information sources:

6.1.1 Automatic Workload Repository (AWR)

AWR is the integrated performance diagnostic tool built into Oracle Database. AWR includes Exadata storage server configuration, health, and statistics information.

AWR collects information from various database dynamic performance views at regular intervals. Each data collection is an AWR snapshot. An AWR report shows statistics captured between two AWR snapshots.

AWR collects and reports a wealth of statistical data related to Oracle Database, including database statistics and wait events from V$SYSSTAT, V$SYSTEM_EVENT, V$SEGMENT_STATISTICS, V$SQL.

Commencing with Oracle Database release 12.1.0.2, AWR also collects performance statistics from Exadata storage servers and includes them in the HTML version of the AWR report. Consequently, AWR provides comprehensive and unified performance information that includes Oracle Database statistics and Exadata statistics, storage metrics, and alerts from all storage servers.

Note:

The Exadata sections are only available in the HTML version of the AWR report.

The AWR report contains the following Exadata-specific sections:

Key benefits include:

  • AWR provides comprehensive and unified performance information that includes Oracle Database metrics and Exadata storage metrics and alerts from all storage servers.
  • AWR enables simple outlier detection and analysis.
  • The granularity of AWR information is customizable and based on the AWR collection interval, which is hourly by default.

It is important to note that the Exadata storage server statistics are collected and maintained by the storage servers, so the information is not restricted to a specific database or database instance. Consequently, the storage related statistics in AWR includes the I/O for all databases running on the storage servers, while the database statistics relate to the specific instance or database hosting AWR.

When using a container database (CDB), the Exadata-specific sections of the AWR report are only available in the root container.

As part of Oracle Database, AWR is constantly being enhanced with new statistics and analysis. The availability of a specific section or statistic is subject to the version of Oracle Database being used.

Exadata Server Configuration

The Exadata Server Configuration section of the AWR report shows configuration information from all of the associated storage servers, including model information, software versions, and storage information relating to cell disks, grid disks, ASM disk groups, and the IORM objective. By examining this information, you can easily see any configuration differences across the storage servers that could potentially affect the behavior or performance of the system.

The following example shows part of the Exadata Server Configuration section. The example is based on a system with 14 X8-2 High Capacity storage servers. In the example, all of the servers have the same software version. However, the storage information shows that one cell contains 2 flash cell disks, while the others all have 4 flash cell disks.

Figure 6-1 AWR Report: Exadata Server Configuration

Description of Figure 6-1 follows
Description of "Figure 6-1 AWR Report: Exadata Server Configuration"

Exadata Server Health Report

The Exadata Server Health Report section of the AWR report shows information about offline disks and open storage server alerts.

When a disk is offline, then the total I/O bandwidth is reduced, which may affect I/O performance.

Alerts represent events of importance on a storage server that could affect its functionality and performance. Open alerts are alerts that have not been marked as examined by an administrator. For more details, see Exadata Alerts.

Ideally, there should be no open alerts in the AWR report, as shown in the following example:

Figure 6-2 AWR Report: No Open Alerts

Description of Figure 6-2 follows
Description of "Figure 6-2 AWR Report: No Open Alerts"

The following example shows Exadata alert details indicating that the offload server was unable to start, which would affect smart scan performance on the system.

Figure 6-3 AWR Report: Exadata Alert Details

Description of Figure 6-3 follows
Description of "Figure 6-3 AWR Report: Exadata Alert Details"

Exadata Statistics

AWR includes statistics from various Exadata storage server components, such as Smart I/O, Smart Flash Cache, Smart Flash Log, XRMEM cache, XRMEM log, and IO Resource Manager. It includes I/O statistics from both the operating system and the cell server software, and performs simple outlier analysis on these I/O statistics. Because Exadata typically distributes I/O operations evenly across all cells and all disks, outliers may indicate a potential problem and warrant further investigation. If any cell or disk is behaving differently or performing more work, this could potentially cause slower performance for the whole system.

The Exadata Statistics section of the AWR report includes the following sub-sections:

  • Performance Summary
  • Exadata Resource Statistics
    • Exadata Outlier Summary
    • Exadata OS Statistics Outliers
    • Exadata Cell Server Statistics Outliers
    • Exadata Outlier Details
    • Exadata OS Statistics Top
    • Exadata Cell Server Statistics Top
  • Exadata Smart Statistics
    • Smart IO
    • Flash Log
    • XRMEM Log
    • Flash Cache
    • XRMEM Cache
  • Exadata IO Reasons
    • Top IO Reasons by Requests
    • Top IO Reasons by MB
    • Internal IO Reasons
  • Exadata Top Database Consumers
    • Top Databases by Requests
    • Top Databases by Requests - Details
    • Top Databases by Throughput
    • Top Databases by Requests per Cell
    • Top Databases by Requests per Cell - Details
    • Top Databases by Throughput per Cell
  • Exadata IO Latency Capping
    • Cancelled IOs - Client
    • Cancelled IOs - Cells
  • Exadata Flash Wear

Key Exadata statistics are discussed in detail in the corresponding topics under Monitoring Oracle Exadata System Software Components.

6.1.2 Database Statistics and Wait Events

Oracle Database provides performance information in a series of dynamic performance views (also known as V$ views).

The dynamic performance views contain a wealth of database statistics and wait events. By using dynamic performance views, a database administrator can perform detailed and customized monitoring on individual SQL operations and database sessions. The dynamic performance views are also a major information source for Automatic Workload Repository (AWR) and other Oracle Database monitoring tools, such as SQL monitor.

The following sections introduce the main dynamic performance views that include Exadata-specific information.

V$SYSSTAT

V$SYSSTAT contains various system-wide statistics relating to the current database instance, including I/O statistics related to Exadata storage server.

For example, you can use the following query to display I/O statistics related to Exadata storage server:

SQL> SELECT name, value
     FROM v$sysstat
     WHERE name like '%cell%' or name like 'physical%'
     ORDER BY name;

These statistics are also in the Global Activity Statistics or Instance Activity Statistics section of the AWR report.

V$SESSION

V$SESSION contains various statistics and wait events for currently active database sessions.

For example, you can use the following query to display a summary that shows the number of sessions that are currently waiting on cell events:

SQL> SELECT event, count(*)
     FROM v$session
     WHERE event like 'cell%'
       AND wait_time = 0  -- session currently waiting
     GROUP BY event
     ORDER BY count(*) desc;

The following query displays event details for sessions that are currently waiting on cell events:

SQL> SELECT event, p1, p2, p3
     FROM v$session
     WHERE event like 'cell%'
       AND wait_time = 0;

For cell wait events in V$SESSION:

  • The P1 column identifies the cell hash number. You can use the V$CELL view to identify the cell from the cell hash number.
  • The P2 column identifies the disk hash number. You can use the V$ASM_DISK view to identify the disk name from the disk hash number.

V$SESSION_EVENT

V$SESSION_EVENT displays information on waits for an event by a session.

For example, you can use the following query to display a summary of the wait events for the current session. By examining the output from this query, you can compare the time spent waiting for Exadata storage server events with the time spent waiting for other events in the database.

SQL> SELECT event, total_waits, time_waited_micro, time_waited_micro/decode(total_waits, 0, null, total_waits) avg_wait_time
     FROM v$session_event
     WHERE wait_class != 'Idle'
       AND sid = userenv('SID')
     ORDER BY time_waited_micro desc;

V$SYSTEM_EVENT

V$SYSTEM_EVENT displays system-wide information on total waits for an event.

You can use the following query to display a summary of the wait events since start-up of the current database instance:

SQL> SELECT event, total_waits, time_waited_micro, time_waited_micro/decode(total_waits, 0, null, total_waits) avg_wait_time
     FROM v$system_event
     WHERE wait_class != 'Idle'
     ORDER by time_waited_micro desc;

Information from V$SYSTEM_EVENT is contained in the Wait Events Statistics section of the AWR report.

V$ACTIVE_SESSION_HISTORY

V$ACTIVE_SESSION_HISTORY keeps a history of the active sessions on the system, where active is defined as using CPU or waiting on a non-idle wait event. For each session, V$ACTIVE_SESSION_HISTORY includes a wealth of information on what the session was doing, such as SQL execution plans and wait event details, including information about Exadata storage interactions.

For example, you can use the following query to display cell block reads over the past 5 minutes with latency in excess of 10ms:


SQL> SELECT sample_time, p1, p2, time_waited
     FROM v$active_session_history
     WHERE event = 'cell single block physical read'
       AND wait_time = 0  -- currently waiting
       AND sample_time > sysdate - 5/1440  -- past 5 minutes
       AND time_waited > 10000
     ORDER by time_waited desc;

The Active Session History (ASH) Report section of the AWR report is based on V$ACTIVE_SESSION_HISTORY. Oracle Database also provides a more detailed Active Session History (ASH) report.

V$SQL

V$SQL, and related views like V$SQLAREA, V$SQLAREA_PLAN_HASH, V$SQLSTATS, and V$SQLSTATS_PLAN_HASH, contain information and statistics for SQL statements processed on the database instance.

V$SQL, and related views, include the following columns that contain information about Exadata storage interactions:

  • PHYSICAL_READ_BYTES — number of bytes read by the SQL
  • PHYSICAL_READ_REQUESTS — number of read requests issued by the SQL
  • PHYSICAL_WRITE_BYTES — number of bytes written by the SQL
  • PHYSICAL_WRITE_REQUESTS — number of write requests issued by the SQL
  • IO_CELL_OFFLOAD_ELIGIBLE_BYTES — number of bytes eligible for predicate offload to Exadata storage server
  • IO_CELL_OFFLOAD_RETURNED_BYTES — number of bytes returned by smart scans
  • IO_INTERCONNECT_BYTES — number of bytes exchanged between the database and the storage servers
  • IO_CELL_UNCOMPRESSED_BYTES — number of uncompressed bytes read, where uncompressed bytes is the size after decompression
  • OPTIMIZED_PHY_READ_REQUESTS — number of read requests satisfied from Exadata Smart Flash Cache, or read requests avoided due to storage index or columnar cache

The following example shows a query that displays Exadata offload processing performance data. The output focuses on a query that scans the SALES table. The output shows that all of the data in the SALES table (approximately 5 GB) was eligible for offload processing. In this case, because of offload processing, even though the query performed 5385.75 MB of I/O, only 417.65 MB of data was delivered over the network to the database host.

SQL> SELECT sql_text,
       io_cell_offload_eligible_bytes/1024/1024 offload_eligible_mb,
       io_cell_uncompressed_bytes/1024/1024 io_uncompressed_mb,
       io_interconnect_bytes/1024/1024 io_interconnect_mb,
       io_cell_offload_returned_bytes/1024/1024 cell_return_bytes_mb,
       (physical_read_bytes + physical_write_bytes)/1024/1024 io_disk_mb 
     FROM v$sql 
     WHERE UPPER(sql_text) LIKE '%FROM SALES%';

SQL_TEXT                      OFFLOAD_ELIGIBLE_MB  IO_UNCOMPRESSED_MB  IO_INTERCONNECT_MB  CELL_RETURN_BYTES_MB  IO_DISK_MB
----------------------------  -------------------  ------------------  ------------------  --------------------  –---------
select count(*) from sales                5283.06             5283.06              520.34                417.65     5385.75

Information from V$SQL is displayed in the SQL Statistics section of the AWR report.

V$SEGMENT_STATISTICS

V$SEGMENT_STATISTICS contains statistics on a per-segment basis. The segment-level statistics can be used to detect specific objects, such as tables or indexes, that are performing optimized reads from Exadata storage.

The optimized physical read segment statistic records the number of read requests for objects that are read from Exadata Smart Flash Cache or reads that are avoided through the use of storage index or columnar cache. The optimized physical writes statistic records the number of write requests for an object that first went to Exadata Smart Flash Cache. Such write requests can be synchronized to the disk later in a lazy manner to free up cache space.

The following example query displays objects that are associated with more than 1000 optimized reads from Exadata storage. A similar query can be used to determine the objects that are associated with very few optimized reads.

SQL> SELECT object_name, value
     FROM v$segment_statistics
     WHERE statistic_name='optimized physical reads' 
       AND value>1000 
     ORDER BY value;

Information from V$SEGMENT_STATISTICS is displayed in the Segment Statistics section of the AWR report.

6.1.3 Exadata Metrics

Exadata metrics are recorded observations of important properties or values relating to the Exadata system software.

Exadata metrics contain detailed statistics for most Exadata components. Most metrics relate to the storage server and its components, such as flash cache, cell disks, and grid disks. Storage server metrics enable detailed monitoring of Exadata storage server performance.

Metrics are of the following types:

  • Cumulative metrics are statistics that accumulate over time since the metric was created or the server was restarted.

    For example, CL_IO_RQ_NODATA is the total number of I/O requests that didn't return data.

  • Instantaneous metrics contain the current value at the time of the metric observation.

    For example, CL_MEMUT is the percentage of total physical memory currently used on the server.

  • Rate metrics are computed statistics where the value is observed over time.

    For example, N_NIC_KB_TRANS_SEC is the number of kilobytes per second transmitted over the server's Ethernet interfaces.

Some metrics differentiate between small I/O and large I/O. For such metrics, small I/O means I/O that is less than or equal to 128 KB in size. Large I/O is greater than 128 KB in size.

By default, metric collections occur at 1-minute intervals. Metric observations in the default collection are initially recorded in memory and later written to a disk-based repository. For most metrics, seven days of metric observations are maintained by default. However, a subset of key metrics are retained for up to one year. In all cases, historical metric observations are purged automatically if the server detects a storage space shortage.

Commencing with Oracle Exadata System Software 22.1.0, you can optionally configure fine-grained metrics. To enable fine-grained metrics, you must specify a collection interval between 1 and 60 seconds. You can also choose the metrics to include in the fine-grained collection. Fine-grained metric collection is the foundation for real-time metric streaming. Consequently, fine-grained metrics are only recorded in memory to support efficient streaming to an external metric collection and visualization platform.

You can use the CellCLI LIST command as the primary means to display Exadata metric observations. See Displaying Metrics. However, because Exadata metrics are managed within each Exadata server, metric observations must be collected and correlated by some additional means to gain a system-wide view.

Related Topics

6.1.4 Exadata Alerts

Alerts draw attention to potential and actual problems and other events of interest to an administrator.

There are three types of alerts; informational alerts, warning alerts, and critical alerts.

Metrics observations are automatically compared with stored thresholds. When a metric crosses a specified threshold, an alert is automatically generated. Metrics can be associated with warning and critical thresholds. When a metric value crosses a warning threshold, then a warning alert is generated. When a metric crosses a critical threshold, then a critical alert is generated.

Exadata administrators can specify the warning and critical threshold for most metrics. There are also some preset thresholds. For example, there are built-in thresholds for ambient temperature. If the temperature is too low or too high, an alert is automatically generated.

Some system conditions and state changes can generate informational alerts; for example, a server chassis is opened. More serious system errors and state changes can also generate warning or critical alerts; for example, a hard disk device is reporting disk errors.

This system of thresholds and alerts enables administrators to focus on the most important events, while maintaining awareness other interesting events.

You can use CellCLI commands to manage and display Exadata alerts and thresholds. You can also configure alert notification using email and SNMP. Alert information is also included in the AWR report.

Like Exadata metrics, alerts are managed separately on each server.

Related Topics

6.1.5 Real-Time Insight

You can use the Real-Time Insight feature to enable real-time monitoring of your Exadata systems.

Commencing with Oracle Exadata System Software 22.1.0, Real-Time Insight provides infrastructure to:

  • Categorize specific metrics as fine-grained, and enable the collection of fine-grained metrics as often as every second.

  • Stream metric observations to user-defined locations in real-time, using either push (upload) or pull (download) transmission models.

Related Topics

6.1.6 ExaWatcher

ExaWatcher collects very detailed system statistics, primarily focused on the operating system and network interfaces.

ExaWatcher runs separately on each server and collects server and network-related data using operating system tools and utilities, such as iostat, vmstat, mpstat, top. It also collects storage server statistics using cellsrvstat.

The information is very granular, with most operating system statistics collected at 5-second intervals. To extract statistics from ExaWatcher for analysis, you can use the GetExaWatcherResults.sh script. The output from GetExaWatcherResults.sh also include charts for a small subset of key statistics.

ExaWatcher is typically used for detailed debugging. Consequently, it is recommended to keep baselines of ExaWatcher data for comparison purposes.

6.2 Guidelines for Exadata Monitoring

Regardless of the information source, it is considered good practice to maintain baseline collections of the various performance data sources covering the important periods of your workload, such as peak periods, month-end or year-end processing, or before patching or upgrades.

Baseline information enables a comparison with a known 'normal' state when a performance regression occurs. Such comparison helps to identify changes and guide further analysis.

Baseline data should include AWR data (using AWR exports or AWR baselines), Exadata metrics, and ExaWatcher collections. For data warehousing workloads that rely on smart scans, keeping a collection of SQL monitor reports for key SQL statements is also highly recommended.

6.3 Monitoring Oracle Exadata System Software Components

6.3.1 Monitoring Exadata Smart Flash Cache

Exadata Smart Flash Cache holds frequently accessed data in flash storage, while most data is kept in very cost-effective disk storage. Caching occurs automatically and requires no user or administrator effort. Exadata Smart Flash Cache can intelligently determine the data that is most useful to cache based on data usage, access patterns, and hints from the database that indicate the type of data being accessed.

Exadata Smart Flash Cache can operate in Write-Through mode or Write-Back mode. In Write-Through mode, database writes go to disk first and subsequently populate Flash Cache. If a flash device fails with Exadata Smart Flash Cache operating in Write-Through mode, there is no data loss because the data is already on disk.

In Write-Back mode, database writes go to Flash Cache first and later to disk. The contents of the Write-Back Flash Cache is persisted across reboots, eliminating any warm-up time needed to populate the cache. Write-intensive applications can benefit from Write-Back caching by taking advantage of the fast latencies provided by flash. The amount of disk I/O also reduces when the cache absorbs multiple writes to the same block before writing it to disk. However, if a flash device fails while using Write-Back mode, data that is not yet persistent to disk is lost and must be recovered from a mirror copy. For this reason, Write-Back mode is recommended in conjunction with using high redundancy (triple mirroring) to protect the database files.

Exadata Smart Flash Cache accelerates OLTP performance by providing fast I/O for frequently accessed data blocks. It also accelerates Decision Support System (DSS) performance by automatically caching frequently scanned data and temporary segments, providing columnar cache storage, and enabling other features that optimize the performance of large analytic queries and large loads.

On Extreme Flash (EF) storage servers, all of the data resides in flash. Consequently, Exadata Smart Flash Cache is not required for normal caching. However, in this case Exadata Smart Flash Cache is still used to host the Columnar Cache, which caches data in columnar format and optimizes various analytical queries.

Performance issues related to Exadata Smart Flash Cache typically exhibit increased cell single block physical read latencies in the database. However, occasional long latencies associated with cell single block physical read do not necessarily indicate a performance issue with Exadata Smart Flash Cache, and may simply indicate that the read request was not satisfied using Exadata Smart Flash Cache.

6.3.1.1 Monitoring Exadata Smart Flash Cache Using AWR

Automatic Workload Repository (AWR) contains a wealth of information relating to Exadata Smart Flash Cache. Following are descriptions and examples of key sections in the AWR report that contain information about Exadata Smart Flash Cache. By reviewing these sections of the AWR report, administrators can understand how Exadata Smart Flash Cache operates.

Flash Cache Configuration and Space Usage

The Flash Cache Configuration section contains summary information including the caching mode (Write-Through or Write-Back), status and overall size. The Flash Cache Space Usage section provides summary statistics on space usage in Exadata Smart Flash Cache.

The following example shows Exadata Smart Flash Cache configured in Write-Back mode on all cells, with a size of almost 24 TB on each cell. In the example, the Flash Cache Space Usage section shows that:

  • Practically all of the Exadata Smart Flash Cache space is in use on each cell.

  • Approximately 70% of the allocated space is being used for OLTP, which is typically data that is read into the database buffer cache using block mode reads. The OLTP data is further categorized as follows:

    • Clean data is data that is unchanged since being read into the cache. In the example output, approximately 26% of the space in each cache contains clean OLTP data.

    • If cached OLTP data is updated and the update is written back to the primary storage (usually a hard disk drive), the data is classified as synced. In the example output, approximately 23% of the space in each cache contains synced OLTP data.

    • When cached OLTP data is updated, but the update has not been written to the primary storage, the data is classified as unflushed. In the example output, approximately 22% of the space in each cache contains unflushed OLTP data.

  • Approximately 20% of the cache space is being used to absorb large writes.

  • Approximately 6% of the cache supports scan operations.

  • Approximately 2% of the cache is being used to support the columnar cache.

  • In this case, none of the cache space is occupied by keep objects. Keep objects are database segments (tables, partitions, and so on) that override the default caching policy for Exadata Smart Flash Cache by using the CELL_FLASH_CACHE storage clause option with the KEEP setting.

Figure 6-4 AWR Report: Flash Cache Configuration and Space Usage

The image shows an example of the Flash Cache Configuration and Flash Cache Space Usage sections in the AWR report.

Flash Cache User Reads

The Flash Cache User Reads sections show information about read requests, read throughput, and read efficiency from database clients. The statistics show the I/Os against different areas of Exadata Smart Flash Cache:

  • OLTP - relates to block requests
  • Scan - relates to scan requests
  • Columnar - relates to columnar cache requests
  • Keep - relates to read requests against the KEEP pool

Figure 6-5 AWR Report: Flash Cache User Reads

The image shows an example of the Flash Cache User Reads, Flash Cache User Reads Per Second, and Flash Cache User Reads Efficiency sections in the AWR report.

Flash Cache User Writes

The Flash Cache User Writes section shows information about write requests and write throughput for Exadata Smart Flash Cache in Write-Back mode.

In this section, First Writes indicate new data being written to Exadata Smart Flash Cache, while Overwrites indicate data being overwritten in Exadata Smart Flash Cache. First Writes also require writing additional flash cache metadata. Overwrites represents the disk writes that were avoided by using Exadata Smart Flash Cache in Write-Back mode.

Figure 6-6 AWR Report: Flash Cache User Writes

The image shows an example of the Flash Cache User Writes section in the AWR report.

Flash Cache User Writes - Large Writes

The Flash Cache User Writes - Large Writes sections show information about large write requests that are absorbed and rejected by Exadata Smart Flash Cache in Write-Back mode.

Figure 6-7 AWR Report: Flash Cache User Writes - Large Writes

The image shows an example of the Flash Cache User Writes - Large Writes and Flash Cache User Writes - Large Write Rejections sections in the AWR report.

Flash Cache Internal Reads

The Flash Cache Internal Reads section shows reads from Exadata Smart Flash Cache that are performed by Oracle Exadata System Software. These statistics are populated when Exadata Smart Flash Cache is in Write-Back mode.

The Disk Writer IO Detail columns relate to internal I/Os performed to persist data from Exadata Smart Flash Cache to hard disk devices. These columns show the reads from flash and the various categories of writes to hard disk.

Figure 6-8 AWR Report: Flash Cache Internal Reads

The image shows an example of the Flash Cache Internal Reads section in the AWR report.

Flash Cache Internal Writes

The Flash Cache Internal Writes section shows writes to Exadata Smart Flash Cache that are performed by Oracle Exadata System Software. The internal writes are I/Os that populate Exadata Smart Flash Cache in response to a cache read miss.

The statistics also include metadata writes that occur when Exadata Smart Flash Cache is in Write-Back mode. Metadata writes occur when a cacheline is used to cache new data.

Figure 6-9 AWR Report: Flash Cache Internal Writes

The image shows an example of the Flash Cache Internal Writes section in the AWR report.
6.3.1.2 Monitoring Exadata Smart Flash Cache Using Database Statistics and Wait Events

The following table describes various database statistics that are useful for monitoring Exadata Smart Flash Cache. The statistics are available in various dynamic performance views, including V$SYSSTAT, and may be displayed in the Global Activity Statistics or Instance Activity Statistics section of an AWR report.

Statistic Description
cell flash cache read hits The number of read requests that were satisfied by the cache
cell flash cache read hits for smart IO The number of read requests for smart IO that were satisfied by the cache
cell flash cache read hits for temp IO The number of read requests for temp IO that were satisfied by the cache
cell flash read hits for controlfile reads The number of read requests for a database control file that were satisfied by the cache
cell overwrites in flash Total number of write requests that overwrote an existing cacheline in Exadata Smart Flash Cache that had not been written to disk. In effect, this is the amount of disk I/O saved by using Write-Back mode. This statistic is incremented once per mirror write.
cell partial writes in flash Total number of write requests written to both Exadata Smart Flash Cache and disk. Part of the data was written to flash, and the rest was written to disk. This statistic is incremented once per mirror write.
cell writes to flash cache Total number of write requests written entirely to Exadata Smart Flash Cache. This statistic is incremented once per mirror write.
cell writes to flash cache for temp IO The number of write requests for temporary segments that were absorbed by Exadata Smart Flash Cache
physical read IO requests Number of physical read requests issued by the database.
physical read requests optimized Total number of read requests satisfied by using Exadata Smart Flash Cache, and read requests avoided by using storage index or columnar cache.
physical read total bytes Total amount of I/O bytes for reads issued by the database, whether or not the read was offloaded to the storage servers.
physical read total bytes optimized Total number of bytes read from Exadata Smart Flash Cache, and bytes avoided by using storage index or columnar cache.
physical read total IO requests Total number of read requests issued by the database for all activity including application, backup, recovery, and other utilities.
physical write IO requests Number of physical write requests issued by the database.
physical write total bytes Total number of IO bytes for writes issued by the database for all activity.
physical write total bytes optimized Total number of bytes written to Exadata Smart Flash Cache. These writes are synchronized to disk in a lazy manner.
physical write total IO requests Total number of write requests issued by the database for all activity, including application, backup, recovery and other utilities.

The following table describes database wait events that are useful for monitoring Exadata Smart Flash Cache. The wait events are visible in various dynamic performance views, including V$SESSION, V$SYSTEM_EVENT and V$SESSION_EVENT, and may be displayed in the Wait Event sections of the AWR report.

The latency of the cell single block physical read event typically indicates if the read is satisfied from Exadata Smart Flash Cache. The higher the latency, the higher the likelihood that the request was not satisfied from Exadata Smart Flash Cache.

Wait Event Description
cell list of blocks physical read

This wait event occurs during recovery or during buffer pre-fetching (rather than performing multiple single-block reads). It is used to monitor database blocks that must be changed as part of recovery and are read in parallel for the database.

In V$SESSION, records associated with this event include additional parameters:

  • P1 contains the relevant storage server hash number, which corresponds to V$CELL.CELL_HASHVAL.

  • P2 identifies the disk hash number for the grid disk that contains the data, which corresponds to V$ASM_DISK.HASH_VALUE.

  • P3 specifies the number of bytes processed during the I/O read operation.

This wait event is equivalent to db file parallel read for a cell.

cell list of blocks read request

This is a placeholder wait event associated with cell list of blocks physical read, which is visible only during the wait period. After the wait event ends, the placeholder is typically converted to cell list of blocks physical read.

cell multiblock physical read

This wait event represents the time taken to perform all the I/Os for a multi-block database read.

In V$SESSION, records associated with this event include additional parameters:

  • P1 contains the relevant storage server hash number, which corresponds to V$CELL.CELL_HASHVAL.

  • P2 identifies the disk hash number for the grid disk that contains the data, which corresponds to V$ASM_DISK.HASH_VALUE.

  • P3 specifies the number of bytes processed during the I/O read operation.

This wait event is equivalent to db file scattered read for a cell.

cell multiblock read request

This is a placeholder wait event associated with cell multiblock physical read, which is visible only during the wait period. After the wait event ends, the placeholder is typically converted to cell multiblock physical read.

cell single block physical read

This wait event represents the time taken to perform a single block database I/O.

In V$SESSION, records associated with this event include additional parameters:

  • P1 contains the relevant storage server hash number, which corresponds to V$CELL.CELL_HASHVAL.

  • P2 identifies the disk hash number for the grid disk that contains the data, which corresponds to V$ASM_DISK.HASH_VALUE.

  • P3 specifies the number of bytes processed during the I/O read operation.

This wait event is equivalent to db file sequential read for a cell.

Commencing with the May 2022 Oracle Database release updates (versions 19.15.0.0.220419, 21.6.0.0.220419, and later), this wait event no longer includes I/O from Exadata Smart Flash Cache or database I/O using a Remote Direct Memory Access (RDMA) read.

cell single block physical read: flash cache

This wait event represents the time taken to perform a single block database I/O from Exadata Smart Flash Cache.

In V$SESSION, records associated with this event include additional parameters:

  • P1 contains the relevant storage server hash number, which corresponds to V$CELL.CELL_HASHVAL.

  • P2 identifies the disk hash number for the grid disk that contains the data (not the cache location), which corresponds to V$ASM_DISK.HASH_VALUE.

  • P3 specifies the number of bytes processed during the I/O read operation.

This wait event was introduced in the May 2022 Oracle Database release updates and is present in Oracle Database versions 19.15.0.0.220419, 21.6.0.0.220419, and later. Previously, cell single block physical read included these waits.

cell single block read request

This is a placeholder wait event associated with a single block database I/O that is visible only during the wait period. After the wait event ends, the placeholder is converted to the appropriate wait event, which is typically one of the cell single block physical read events.

cell interconnect retransmit during physical read

This wait event appears during retransmission for an I/O of a single-block or multiblock read. The cell hash number in the P1 column in the V$SESSION_WAIT view is the same cell identified for cell single block physical read and cell multiblock physical read. The P2 column contains the subnet number to the cell, and the P3 column contains the number of bytes processed during the I/O read operation.

The availability of a specific statistic or wait event is subject to the version of Oracle Database being used.

6.3.1.3 Monitoring Exadata Smart Flash Cache Using Exadata Metrics

Exadata metrics that are related to Exadata Smart Flash Cache are identified in the Exadata storage server METRICCURRENT, METRICDEFINITION, and METRICHISTORY objects as having objectType=FLASHCACHE.

Example 6-1 Displaying Flash Cache Metric Definitions

This example shows how to display the flash cache metric definitions that are available in the Oracle Exadata System Software.

CellCLI> LIST METRICDEFINITION ATTRIBUTES NAME,DESCRIPTION WHERE OBJECTTYPE = FLASHCACHE
         FC_BYKEEP_DIRTY                         "Number of megabytes unflushed for keep objects on FlashCache"
         FC_BYKEEP_OLTP                          "Number of megabytes for OLTP keep objects in flash cache"
         FC_BYKEEP_OVERWR                        "Number of megabytes pushed out of the FlashCache because of space limit for keep objects"
         FC_BYKEEP_OVERWR_SEC                    "Number of megabytes per second pushed out of the FlashCache because of space limit for keep objects"
         FC_BYKEEP_USED                          "Number of megabytes used for keep objects on FlashCache"
         FC_BY_ALLOCATED                         "Number of megabytes allocated in flash cache"
         FC_BY_ALLOCATED_DIRTY                   "Number of megabytes allocated for unflushed data in flash cache"
         FC_BY_ALLOCATED_OLTP                    "Number of megabytes allocated for OLTP data in flash cache"
         FC_BY_DIRTY                             "Number of unflushed megabytes in FlashCache"
         FC_BY_STALE_DIRTY                       "Number of unflushed megabytes in FlashCache which cannot be flushed because cached disks are not accessible"
         FC_BY_USED                              "Number of megabytes used on FlashCache"
         FC_COL_BYKEEP_USED                      "Number of megabytes used for keep objects in Columnar FlashCache"
         FC_COL_BY_USED                          "Number of megabytes used in Columnar FlashCache"
         FC_COL_IO_BYKEEP_R                      "Number of megabytes read from Columnar FlashCache for keep objects"
         FC_COL_IO_BYKEEP_R_SEC                  "Number of megabytes read per second from Columnar FlashCache for keep objects"
         FC_COL_IO_BY_R                          "Number of megabytes that were read from Columnar FlashCache"
         FC_COL_IO_BY_R_ELIGIBLE                 "Number of megabytes eligible to read from Columnar FlashCache"
         FC_COL_IO_BY_R_ELIGIBLE_SEC             "Number of megabytes per second eligible to read from Columnar FlashCache"
         FC_COL_IO_BY_R_SEC                      "Number of megabytes per second that were read from Columnar FlashCache"
         FC_COL_IO_BY_SAVED                      "Number of megabytes saved by reads from Columnar FlashCache"
         FC_COL_IO_BY_SAVED_SEC                  "Number of megabytes saved per second by reads from Columnar FlashCache"
         FC_COL_IO_BY_W_POPULATE                 "Number of megabytes that are population writes into Columnar FlashCache due to read miss"
         FC_COL_IO_BY_W_POPULATE_SEC             "Number of megabytes per second that are population writes into Columnar FlashCache due to read miss"
         FC_COL_IO_RQKEEP_R                      "Number of requests read for keep objects from Columnar FlashCache"
         FC_COL_IO_RQKEEP_R_SEC                  "Number of requests read per second for keep objects from Columnar FlashCache"
         FC_COL_IO_RQ_R                          "Number of requests that were read from Columnar FlashCache"
         FC_COL_IO_RQ_R_ELIGIBLE                 "Number of reads eligible for Columnar FlashCache"
         FC_COL_IO_RQ_R_ELIGIBLE_SEC             "Number of reads per second eligible for Columnar FlashCache"
         FC_COL_IO_RQ_R_SEC                      "Number of requests per second that were read from Columnar FlashCache"
         FC_COL_IO_RQ_W_POPULATE                 "Number of requests that are population writes into Columnar FlashCache due to read miss"
         FC_COL_IO_RQ_W_POPULATE_SEC             "Number of requests per second that are population writes into Columnar FlashCache due to read miss"
         FC_IO_BYKEEP_R                          "Number of megabytes read from FlashCache for keep objects"
         FC_IO_BYKEEP_R_SEC                      "Number of megabytes read per second from FlashCache for keep objects"
         FC_IO_BYKEEP_W                          "Number of megabytes written to FlashCache for keep objects"
         FC_IO_BYKEEP_W_SEC                      "Number of megabytes per second written to FlashCache for keep objects"
         FC_IO_BY_DISK_WRITE                     "Number of megabytes written from flash cache to hard disks"
         FC_IO_BY_DISK_WRITE_SEC                 "Number of megabytes per second written from flash cache to hard disks"
         FC_IO_BY_R                              "Number of megabytes of small reads (OLTP) from flash cache"
         FC_IO_BY_R_ACTIVE_SECONDARY             "Number of megabytes for active secondary reads satisfied from flash cache"
         FC_IO_BY_R_ACTIVE_SECONDARY_MISS        "Number of megabytes for active secondary reads not satisfied from flash cache"
         FC_IO_BY_R_ACTIVE_SECONDARY_MISS_SEC    "Number of megabytes per second for active secondary reads not satisfied from flash cache"
         FC_IO_BY_R_ACTIVE_SECONDARY_SEC         "Number of megabytes per second for active secondary reads satisfied from flash cache"
         FC_IO_BY_R_DISK_WRITER                  "Number of megabytes read from flash cache by disk writer"
         FC_IO_BY_R_DISK_WRITER_SEC              "Number of megabytes per second read from flash cache by disk writer"
         FC_IO_BY_R_DW                           "Number of megabytes of large reads (DW) from flash cache"
         FC_IO_BY_R_DW_SEC                       "Number of megabytes of large reads (DW) per second from flash cache"
         FC_IO_BY_R_MISS                         "Number of megabytes of small reads (OLTP) from disks because some of the requested data was not in flash cache"
         FC_IO_BY_R_MISS_DW                      "Number of megabytes of large reads (DW) from disks because some of the requested data was not in flash cache"
         FC_IO_BY_R_MISS_DW_SEC                  "Number of megabytes of large reads (DW) per second from disks because some of the requested data was not in flash cache"
         FC_IO_BY_R_MISS_SEC                     "Number of megabytes of small reads (OLTP) per second from disks because some of the requested data was not in flash cache"
         FC_IO_BY_R_SEC                          "Number of megabytes of small reads (OLTP) per second from flash cache"
         FC_IO_BY_R_SKIP                         "Number of megabytes read from disks for IO requests that bypass FlashCache"
         FC_IO_BY_R_SKIP_FC_THROTTLE             "Number of megabytes read from disk for IO requests that bypass FlashCache due to heavy load on FlashCache"
         FC_IO_BY_R_SKIP_FC_THROTTLE_SEC         "Number of megabytes read per second from disk for IO requests that bypass FlashCache due to heavy load on FlashCache"
         FC_IO_BY_R_SKIP_LG                      "Number of megabytes read from disk for IO requests that bypass FlashCache due to the large IO size"
         FC_IO_BY_R_SKIP_LG_SEC                  "Number of megabytes read per second from disk for IO requests that bypass FlashCache due to the large IO size"
         FC_IO_BY_R_SKIP_NCMIRROR                "Number of megabytes read from disk for IO requests that bypass FlashCache as the IO is on non-primary, non-active secondary mirror"
         FC_IO_BY_R_SKIP_SEC                     "Number of megabytes read from disks per second for IO requests that bypass FlashCache"
         FC_IO_BY_W                              "Number of megabytes written to FlashCache"
         FC_IO_BY_W_DISK_WRITER                  "Number of megabytes written to hard disks by disk writer"
         FC_IO_BY_W_DISK_WRITER_SEC              "Number of megabytes per second written to hard disks by disk writer"
         FC_IO_BY_W_FIRST                        "Number of megabytes that are first writes into flash cache"
         FC_IO_BY_W_FIRST_SEC                    "Number of megabytes per second that are first writes into flash cache"
         FC_IO_BY_W_LG_CHINT                     "Number of megabytes used for large writes to flash cache due to cache hint"
         FC_IO_BY_W_LG_DTAGE                     "Number of megabytes used for data aging large writes to flash cache"
         FC_IO_BY_W_LG_MRCV                      "Number of megabytes used for media recovery large writes to flash cache"
         FC_IO_BY_W_METADATA                     "Number of megabytes that are flash cache metadata writes"
         FC_IO_BY_W_METADATA_SEC                 "Number of megabytes per second that are flash cache metadata writes"
         FC_IO_BY_W_OVERWRITE                    "Number of megabytes that are overwrites into flash cache"
         FC_IO_BY_W_OVERWRITE_SEC                "Number of megabytes per second that are overwrites into flash cache"
         FC_IO_BY_W_POPULATE                     "Number of megabytes that are population writes into flash cache due to read miss"
         FC_IO_BY_W_POPULATE_SEC                 "Number of megabytes per second that are population writes into flash cache due to read miss"
         FC_IO_BY_W_SEC                          "Number of megabytes per second written to FlashCache"
         FC_IO_BY_W_SKIP                         "Number of megabytes written to disk for IO requests that bypass FlashCache"
         FC_IO_BY_W_SKIP_FC_THROTTLE             "Number of megabytes written to disk for IO requests that bypass FlashCache due to heavy load on FlashCache"
         FC_IO_BY_W_SKIP_FC_THROTTLE_SEC         "Number of megabytes written per second to disk for IO requests that bypass FlashCache due to heavy load on FlashCache"
         FC_IO_BY_W_SKIP_LG                      "Number of megabytes written to disk for IO requests that bypass FlashCache due to the large IO size"
         FC_IO_BY_W_SKIP_LG_SEC                  "Number of megabytes written per second to disk for IO requests that bypass FlashCache due to the large IO size"
         FC_IO_BY_W_SKIP_NCMIRROR                "Number of megabytes written to disk for IO requests that bypass FlashCache as the IO is on non-primary, non-active secondary mirror"
         FC_IO_BY_W_SKIP_SEC                     "Number of megabytes written to disk per second for IO requests that bypass FlashCache"
         FC_IO_ERRS                              "Number of IO errors on FlashCache"
         FC_IO_RQKEEP_R                          "Number of read requests for keep objects from FlashCache"
         FC_IO_RQKEEP_R_MISS                     "Number of read requests for keep objects which did not find all data in FlashCache"
         FC_IO_RQKEEP_R_MISS_SEC                 "Number of read requests per second for keep objects which did not find all data in FlashCache"
         FC_IO_RQKEEP_R_SEC                      "Number of read requests per second for keep objects from FlashCache"
         FC_IO_RQKEEP_R_SKIP                     "Number of read requests for keep objects that bypass FlashCache"
         FC_IO_RQKEEP_R_SKIP_SEC                 "Number of read requests per second for keep objects that bypass FlashCache"
         FC_IO_RQKEEP_W                          "Number of requests for keep objects which resulted in FlashCache being populated with data"
         FC_IO_RQKEEP_W_SEC                      "Number of requests per second for keep objects which resulted in FlashCache being populated with data"
         FC_IO_RQ_DISK_WRITE                     "Number of requests written from flash cache to hard disks"
         FC_IO_RQ_DISK_WRITE_SEC                 "Number of requests per second written from flash cache to hard disks"
         FC_IO_RQ_R                              "Number of small reads (OLTP) satisfied from the flash cache"
         FC_IO_RQ_REPLACEMENT_ATTEMPTED          "Number of requests attempted to find space in the flash cache"
         FC_IO_RQ_REPLACEMENT_DW_FAILED          "Number of times that client DW IOs failed to find a replacement buffer"
         FC_IO_RQ_REPLACEMENT_DW_SUCCEEDED       "Number of times that client DW IOs succeeded in finding a replacement buffer"
         FC_IO_RQ_REPLACEMENT_FAILED             "Number of requests failed to find space in the flash cache"
         FC_IO_RQ_REPLACEMENT_OLTP_FAILED        "Number of times that client OLTP IOs failed to find a replacement buffer"
         FC_IO_RQ_REPLACEMENT_OLTP_SUCCEEDED     "Number of times that client OLTP IOs succeeded in finding a replacement buffer"
         FC_IO_RQ_R_ACTIVE_SECONDARY             "Number of requests for active secondary reads satisfied from flash cache"
         FC_IO_RQ_R_ACTIVE_SECONDARY_MISS        "Number of requests for active secondary reads not satisfied from flash cache"
         FC_IO_RQ_R_ACTIVE_SECONDARY_MISS_SEC    "Number of requests per second for active secondary reads not satisfied from flash cache"
         FC_IO_RQ_R_ACTIVE_SECONDARY_SEC         "Number of requests per second for active secondary reads satisfied from flash cache"
         FC_IO_RQ_R_DISK_WRITER                  "Number of requests read from flash cache by disk writer"
         FC_IO_RQ_R_DISK_WRITER_SEC              "Number of requests per second read from flash cache by disk writer"
         FC_IO_RQ_R_DW                           "Number of large reads (DW) satisfied from the flash cache"
         FC_IO_RQ_R_DW_SEC                       "Number of large reads (DW) per second satisfied from the flash cache"
         FC_IO_RQ_R_MISS                         "Number of small reads (OLTP) that did not find all data in flash cache"
         FC_IO_RQ_R_MISS_DW                      "Number of large reads (DW) that did not find all data in flash cache"
         FC_IO_RQ_R_MISS_DW_SEC                  "Number of large reads (DW) per second that did not find all data in flash cache"
         FC_IO_RQ_R_MISS_SEC                     "Number of small reads (OLTP) per second that did not find all data in flash cache"
         FC_IO_RQ_R_SEC                          "Number of small reads (OLTP) per second satisfied from the flash cache"
         FC_IO_RQ_R_SKIP                         "Number of requests read from disk that bypass FlashCache"
         FC_IO_RQ_R_SKIP_FC_THROTTLE             "Number of requests read from disk that bypass FlashCache due to heavy load on FlashCache"
         FC_IO_RQ_R_SKIP_FC_THROTTLE_SEC         "Number of requests read from disk per second that bypassed FlashCache due to heavy load on FlashCache"
         FC_IO_RQ_R_SKIP_LG                      "Number of requests read from disk that bypass FlashCache due to the large IO size"
         FC_IO_RQ_R_SKIP_LG_SEC                  "Number of requests read from disk per second that bypass FlashCache due to the large IO size"
         FC_IO_RQ_R_SKIP_NCMIRROR                "Number of requests read from disk that bypass FlashCache as the IO is on non-primary, non-active secondary mirror"
         FC_IO_RQ_R_SKIP_SEC                     "Number of requests read from disk per second that bypass FlashCache"
         FC_IO_RQ_W                              "Number of requests which resulted in FlashCache being populated with data"
         FC_IO_RQ_W_DISK_WRITER                  "Number of requests written to hard disks by disk writer"
         FC_IO_RQ_W_DISK_WRITER_SEC              "Number of requests per second written to hard disks by disk writer"
         FC_IO_RQ_W_FIRST                        "Number of requests that are first writes into flash cache"
         FC_IO_RQ_W_FIRST_SEC                    "Number of requests per second that are first writes into flash cache"
         FC_IO_RQ_W_LG_CHINT                     "Number of large writes to flash cache due to cache hint"
         FC_IO_RQ_W_LG_DTAGE                     "Number of data aging large writes to flash cache"
         FC_IO_RQ_W_LG_MRCV                      "Number of media recovery large writes to flash cache"
         FC_IO_RQ_W_METADATA                     "Number of requests that are flash cache metadata writes"
         FC_IO_RQ_W_METADATA_SEC                 "Number of requests per second that are flash cache metadata writes"
         FC_IO_RQ_W_OVERWRITE                    "Number of requests that are overwrites into flash cache"
         FC_IO_RQ_W_OVERWRITE_SEC                "Number of requests per second that are overwrites into flash cache"
         FC_IO_RQ_W_POPULATE                     "Number of requests that are population writes into flash cache due to read miss"
         FC_IO_RQ_W_POPULATE_SEC                 "Number of requests per second that are population writes into flash cache due to read miss"
         FC_IO_RQ_W_SEC                          "Number of requests per second which resulted in FlashCache being populated with data"
         FC_IO_RQ_W_SKIP                         "Number of requests written to disk that bypass FlashCache"
         FC_IO_RQ_W_SKIP_FC_THROTTLE             "Number of requests written to disk that bypass FlashCache due to heavy load on FlashCache"
         FC_IO_RQ_W_SKIP_FC_THROTTLE_SEC         "Number of requests written to disk per second that bypass FlashCache due to heavy load on FlashCache"
         FC_IO_RQ_W_SKIP_LG                      "Number of requests written to disk that bypass FlashCache due to the large IO size"
         FC_IO_RQ_W_SKIP_LG_SEC                  "Number of requests written to disk per second that bypass FlashCache due to the large IO size"
         FC_IO_RQ_W_SKIP_NCMIRROR                "Number of requests written to disk that bypass FlashCache as the IO is on non-primary, non-active secondary mirror"
         FC_IO_RQ_W_SKIP_SEC                     "Number of requests written to disk per second that bypass FlashCache"

Note the following additional details:

  • Space in Exadata Smart Flash Cache is internally managed in chunks known as cachelines. FC_BY_ALLOCATED represents the amount of space (in megabytes) that is allocated to cachelines in the flash cache. If the value is close to the flash cache size, then the flash cache is fully populated.

  • FC_BY_USED represents amount of space (in megabytes) that is occupied by data in the flash cache. For some workloads, like OLTP, the FC_BY_USED value can be much less than the value of FC_BY_ALLOCATED because an OLTP write might only use a small fraction of a cacheline. The total amount used by OLTP cachelines is tracked by FC_BY_ALLOCATED_OLTP.

  • Various flash cache metrics having names ending with _DW track large read operations, which are typically associated with Data Warehouse workloads. Each of these metrics has a corresponding metric that tracks small read operations, which are typically associated with OLTP operations. To get a total count, add the large read metric value to the corresponding small read metric value.

    For example, to get the total number of reads satisfied from flash cache, add FC_IO_RQ_R_DW to FC_IO_RQ_R.

6.3.1.4 What to Look For When Monitoring Exadata Smart Flash Cache

Increased Read Latencies

Possible issues related to Exadata Smart Flash Cache tend to be visible in the database as increased read latencies, specifically in the cell single block physical read wait event. The increased latency is usually caused when reads are issued against the hard disks instead of being satisfied by Exadata Smart Flash Cache.

Read requests may not be satisfied from Exadata Smart Flash Cache when:

  • The required data is not in Exadata Smart Flash Cache.

    In this case, requests are recorded as flash cache misses, which are visible as increased misses in the Flash Cache User Reads section of the AWR report. This is also typically accompanied by increased population writes, which are visible in the Flash Cache Internal Writes section of the AWR report.

    On the database, you may see a reduction in the number of cell flash cache read hits compared with the physical read IO requests or physical read total IO requests.

  • The data is not eligible for Exadata Smart Flash Cache.

    To maximize cache efficiency, when an I/O request is sent from the database to the storage servers, it includes a hint that indicates whether or not the data should be cached in Exadata Smart Flash Cache.

    If data is not eligible for caching, the corresponding I/O is visible in the Flash Cache User Reads - Skips section of the AWR report, along with the reason why the read was not eligible. Possible reasons include:

    • The grid disk caching policy is set to none.
    • The database segment is configured with the CELL_FLASH_CACHE NONE storage option.
    • The I/O type and situation precludes caching. For example, the I/O is related to an RMAN backup operation and the hard disks are not busy. If this is a common occurrence, it will be evident in Top IO Reasons section of the AWR report

In some cases, the cell single block physical read latency may increase, with no apparent difference in Exadata Smart Flash Cache behavior. This can be caused by increased I/O load, especially on hard disks.

Occasional long latencies in cell single block physical read, usually visible as a long tail in the cell single block physical read histogram may simply indicate that the data is read from hard disk, and does not necessarily indicate a problem with Exadata Smart Flash Cache.

Skipped Writes

When using Exadata Smart Flash Cache in Write-Back mode, most database write requests are absorbed by the cache. However, some write requests may skip the cache. The most common reason for skipping the cache is when the request writes a large amount of data that will be read once, such as temp sorts, or it is not expected to be read at all in the foreseeable future, such as backups and archiving.

Another common reason for skipping large writes is Disk Not Busy, which typically means that there is no benefit to using Exadata Smart Flash Cache because the hard disks have sufficient capacity to handle the write requests.

If skipping Exadata Smart Flash Cache for large writes is causing a performance issue, then it is typically visible in the database with corresponding long latencies for the direct path write or direct path write temp wait events.

The reasons for rejecting large writes are visible in the Flash Cache User Writes - Large Write Rejections section of the AWR report.

Database Working Set Size

The database working set refers to the subset of most commonly accessed information in the database. In most cases, the database working set is fairly stable. However, if for any reason, the working set does not fit in Exadata Smart Flash Cache, you may see the following symptoms:

  • Increased cache misses, indicating that data is not in Exadata Smart Flash Cache. This is visible in the Flash Cache User Reads section of the AWR report, or the FC_IO_RQ_R_MISS_SEC cell metric.
  • Increased population activity to populate data not in Exadata Smart Flash Cache. This is visible in the Flash Cache Internal Writes section of the AWR report, or the FC_IO_[BY|RQ]_W_POPULATE_SEC cell metrics.
  • Increased disk writer activity, indicating that dirty cachelines have to be written to disk, so that the cacheline can be reused to cache other data. Disk writer activity is visible in the Flash Cache Internal Reads section of the AWR report, or the FC_IO_[RQ|BY]_[R|W]_DISK_WRITER_SEC cell metrics.
  • Increased first writes, indicating new data is being written to Exadata Smart Flash Cache. A large number of first writes with few overwrites means new data is being written to Exadata Smart Flash Cache. This is visible in the Flash Cache User Writes section of the AWR report, or in the FC_IO_[RQ|BY]_W_FIRST_SEC cell metrics.

In this case:

  • Review the database access patterns for tuning opportunities that reduce the amount of data being accessed.
  • Consider increasing the number of available storage servers to deliver more space for Exadata Smart Flash Cache.
  • Review your I/O Resource Management (IORM) quotas for Exadata Smart Flash Cache and allocate space where it is most required.
  • Consider using Extreme Flash storage servers to eliminate the disk I/Os.

Miscellaneous Issues

There are some cases when the increased cell single block physical read latency may not be due to cell performance, but may be caused by something else along the IO path, such as network contention or contention for database server CPU resources.

The histograms for cell single block physical read and small reads are available in the Single Block Reads and Small Read Histogram - Detail sections of the AWR report under Exadata Statistics > Performance Summary. The cell single block physical read histogram shows latencies measured by Oracle Database, while the small read histograms show latencies measured in the storage server.

A histogram with a significant number of occasional long latencies is said to have a long tail. When the histograms for cell single block physical read and small reads have long tails, then this is an indication of slow read times in the storage server, which would warrant further investigation of the other I/O performance statistics. See Monitoring Cell Disk I/O.

If the cell single block physical read histogram has a long tail that is not present in the small read histograms, then the cause is generally not in the storage server, but rather something else in the I/O path, such as bottlenecks in the network or contention for compute node CPU.

6.3.2 Monitoring XRMEM Cache

Exadata RDMA Memory Cache (XRMEM cache) provides direct access to storage server memory using Remote Direct Memory Access (RDMA), enabling lower read latencies and faster response times. When database clients read from the XRMEM cache, the client software performs an RDMA read of the cached data, which bypasses the storage server software and results in much lower read latencies.

XRMEM cache works in conjunction with Exadata Smart Flash Cache. If available, data that is not in XRMEM cache may be retrieved from Exadata Smart Flash Cache.

Statistics from XRMEM cache are slightly different when compared with other Exadata components. Because clients issue RDMA I/O directly to XRMEM cache, the request does not go to cellsrv, so the storage server cannot account for the RDMA I/Os. For this reason, there are no cell metrics for XRMEM cache I/O. Instead, Oracle Database statistics account for the I/O that is performed using RDMA.

Performance issues related to XRMEM cache typically cause latency increases in the Oracle Database cell single block physical read wait events. However, bear in mind that Exadata Smart Flash Cache is still available to service the requests, and although requests from Exadata Smart Flash Cache generally experience higher read latencies compared to XRMEM cache, the requests still benefit from the fast I/O provided by flash.

6.3.2.1 Monitoring XRMEM Cache Using AWR

Automatic Workload Repository (AWR) contains information relating to XRMEM cache. Following are descriptions and examples of the most commonly used sections in the AWR report that contain information about XRMEM cache. By reviewing these sections of the AWR report, administrators can understand the operation of XRMEM cache.

Note that because storage servers cannot account for the RDMA I/Os, the Oracle Database statistics are critical in understanding the use of XRMEM cache. The XRMEM cache statistics in the Exadata-specific sections of the AWR report only include I/Os that as serviced by cellsrv, which are not RDMA I/Os.

Database IOs

A summary of Database IOs may be found in the Single Block Reads section of the AWR report. This summary contains information about the effectiveness of XRMEM cache. The Single Block Reads section is located in the AWR report under Exadata Statistics > Performance Summary.

In the following example, the overwhelming majority of all read I/O requests (approximately 88%) are serviced using RDMA reads to XRMEM cache (cell RDMA reads), at a rate of more than 176,000 per second. Nearly all of the rest are satisfied using non-RDMA XRMEM cache reads (cell xrmem cache read hits), while the remaining reads are serviced by using Exadata Smart Flash Cache (cell flash cache read hits). Notice the massive difference in throughput for each different I/O type, which illustrates the power of XRMEM cache and RDMA.

Figure 6-10 AWR Report: Database IOs

The image shows an example of the Database IOs summary in the Single Block Reads section in the AWR report.

XRMEM Cache Configuration and Space Usage

The XRMEM Cache Configuration section contains summary information including the caching mode (Write-Through or Write-Back) and overall size. The XRMEM Cache Space Usage section provides summary statistics on space usage in XRMEM cache.

The following example shows XRMEM cache configured in Write-Through mode on all cells, with a total size of approximately 1509 GB.

Figure 6-11 AWR Report: XRMEM Cache Configuration

The image shows an example of the XRMEM Cache Configuration section in the AWR report.

The following example of the XRMEM Cache Space Usage section shows 172 GB of XRMEM cache spread evenly across 3 cells.

Figure 6-12 AWR Report: XRMEM Cache Space Usage

The image shows an example of the XRMEM Cache Space Usage section in the AWR report.

XRMEM Cache User Reads

The XRMEM Cache User Reads section shows information about read requests, read throughput, and read efficiency. Because storage servers cannot account for RDMA I/Os, the statistics only relate to non-RDMA reads processed by cellsrv. Hits are for I/Os satisfied using XRMEM cache, while misses indicate that the data was not in XRMEM cache.

Figure 6-13 AWR Report: XRMEM Cache User Reads

The image shows an example of the XRMEM Cache User Reads section in the AWR report.

XRMEM Cache Internal Writes

The XRMEM Cache Internal Writes section shows writes to XRMEM cache that are performed by Oracle Exadata System Software. The internal writes are I/Os that populate XRMEM cache.

Figure 6-14 AWR Report: XRMEM Cache Internal Writes

The image shows an example of the XRMEM Cache Internal Writes section in the AWR report.
6.3.2.2 Monitoring XRMEM Cache Using Database Statistics and Wait Events

Note:

The availability of a specific statistic or wait event is subject to the version of Oracle Database being used.

The following table describes database statistics that are useful for monitoring XRMEM cache. The statistics are available in various dynamic performance views, including V$SYSSTAT, and may be displayed in the Global Activity Statistics or Instance Activity Statistics section of an AWR report.

Statistic Description
cell RDMA reads The number of successful XRMEM cache read requests using RDMA
cell RDMA reads eligible The number of database reads that meet basic eligibility criteria for RDMA
cell RDMA read hash table probes The total number of RDMA hash table probes issued to determine the presence of data in XRMEM cache. Each eligible read is usually associated with a hash table probe.
cell RDMA reads issued The total number of RDMA reads issued to retrieve data from XRMEM cache. An RDMA read is usually issued after each successful hash table probe.

cell RDMA probe failures - hash table buffer allocation failed

cell RDMA probe failures - IPCDAT metadata allocation failed

cell RDMA probe failures - IPCDAT errors

The total number of RDMA hash table probes that failed. Each statistic covers a specific failure reason.

The sum of these failures accounts for most of the difference between cell RDMA read hash table probes and cell RDMA reads issued.

Some RDMA hash table probe failures are normal while XRMEM cache is being initialized. However, frequent failures or a large number of failures may indicate a problem requiring further investigation.

cell RDMA read failures - lease expired

cell RDMA read failures - client registration errors

The total number of RDMA reads that failed. Each statistic covers a specific failure reason.

The sum of these failures accounts for most of the difference between cell RDMA reads issued and cell RDMA reads.

Normally, some RDMA read failures may occur. However, frequent failures or a large number of failures may indicate a problem requiring further investigation.

cell RDMA reads rejected - ineligible The number of database reads that failed basic eligibility criteria for RDMA
cell xrmem cache read hits The number of non-RDMA read requests processed by cellsrv resulting in a XRMEM cache hit

The following table describes database wait events that are useful for monitoring XRMEM cache. The wait events are visible in various dynamic performance views, including V$SESSION, V$SYSTEM_EVENT and V$SESSION_EVENT, and may be displayed in the Wait Event sections of the AWR report.

Wait Event Description
cell list of blocks physical read

This wait event occurs during recovery or during buffer pre-fetching (rather than performing multiple single-block reads). It is used to monitor database blocks that must be changed as part of recovery and are read in parallel for the database.

In V$SESSION, records associated with this event include additional parameters:

  • P1 contains the relevant storage server hash number, which corresponds to V$CELL.CELL_HASHVAL.

  • P2 identifies the disk hash number for the grid disk that contains the data, which corresponds to V$ASM_DISK.HASH_VALUE.

  • P3 specifies the number of bytes processed during the I/O read operation.

This wait event is equivalent to db file parallel read for a cell.

cell list of blocks read request

This is a placeholder wait event associated with cell list of blocks physical read, which is visible only during the wait period. After the wait event ends, the placeholder is typically converted to cell list of blocks physical read.

cell multiblock physical read

This wait event represents the time taken to perform all the I/Os for a multi-block database read.

In V$SESSION, records associated with this event include additional parameters:

  • P1 contains the relevant storage server hash number, which corresponds to V$CELL.CELL_HASHVAL.

  • P2 identifies the disk hash number for the grid disk that contains the data, which corresponds to V$ASM_DISK.HASH_VALUE.

  • P3 specifies the number of bytes processed during the I/O read operation.

This wait event is equivalent to db file scattered read for a cell.

cell multiblock read request

This is a placeholder wait event associated with cell multiblock physical read, which is visible only during the wait period. After the wait event ends, the placeholder is typically converted to cell multiblock physical read.

cell single block physical read

This wait event represents the time taken to perform a single block database I/O, equivalent to db file sequential read for a cell.

This wait event does not include I/O from Exadata Smart Flash Cache, I/O from XRMEM cache, or database I/O using a Remote Direct Memory Access (RDMA) read.

In V$SESSION, records associated with this event include additional parameters:

  • P1 contains the relevant storage server hash number, which corresponds to V$CELL.CELL_HASHVAL.

  • P2 identifies the disk hash number for the grid disk that contains the data, which corresponds to V$ASM_DISK.HASH_VALUE.

  • P3 specifies the number of bytes processed during the I/O read operation.

cell single block physical read: RDMA

This wait event represents the time taken to perform a single block database I/O using a Remote Direct Memory Access (RDMA) read.

In V$SESSION, records associated with this event include additional parameters:

  • P1 contains the relevant storage server hash number, which corresponds to V$CELL.CELL_HASHVAL.

  • P2 identifies the disk hash number for the grid disk that contains the data, which corresponds to V$ASM_DISK.HASH_VALUE.

  • P3 specifies the number of bytes processed during the I/O read operation.

This wait event was introduced in the May 2022 Oracle Database release updates and is present in Oracle Database versions 19.15.0.0.220419, 21.6.0.0.220419, and later. Previously, cell single block physical read included these waits.

cell single block physical read: xrmem cache

This wait event represents the time taken to perform a single block database I/O from XRMEM cache.

Effective use of XRMEM cache coincides with extremely low latency for this wait event, which is typical for operations using RDMA.

In V$SESSION, records associated with this event include additional parameters:

  • P1 contains the relevant storage server hash number, which corresponds to V$CELL.CELL_HASHVAL.

  • P2 identifies the disk hash number for the grid disk that contains the data (not the cache location), which corresponds to V$ASM_DISK.HASH_VALUE.

  • P3 specifies the number of bytes processed during the I/O read operation.

cell single block read request

This is a placeholder wait event associated with a single block database I/O that is visible only during the wait period. After the wait event ends, the placeholder is converted to the appropriate wait event, which is typically one of the cell single block physical read events.

cell interconnect retransmit during physical read

This wait event appears during retransmission for an I/O of a single-block or multiblock read. The cell hash number in the P1 column in the V$SESSION_WAIT view is the same cell identified for cell single block physical read and cell multiblock physical read events. The P2 column contains the subnet number to the cell, and the P3 column contains the number of bytes processed during the I/O read operation.

6.3.2.3 Monitoring XRMEM Cache Using Exadata Metrics

Exadata metrics that are related to XRMEM cache are identified in the Exadata storage server METRICCURRENT, METRICDEFINITION, and METRICHISTORY objects as having objectType=XRMEMCACHE.

Note that because reads from XRMEM cache are primarily performed using RDMA calls from Oracle Database, there are no Exadata metrics that tally XRMEM cache I/O.

Example 6-2 Displaying XRMEM Cache Metric Definitions

This example shows how to display the XRMEM cache metric definitions that are available in the Oracle Exadata System Software.

CellCLI> LIST METRICDEFINITION ATTRIBUTES NAME,DESCRIPTION WHERE OBJECTTYPE = XRMEMCACHE
         XRM_BY_ALLOCATED         "Number of megabytes allocated in XRMEM cache"

Note the following additional details:

  • XRM_BY_ALLOCATED represents the number of megabytes allocated in XRMEM cache and tracks how many cachelines are used in the XRMEM cache. If the value is close to the XRMEM cache size, then the XRMEM cache is fully populated.

    This metric is also available per database (DB_XRM_BY_ALLOCATED), and per PDB (PDB_XRM_BY_ALLOCATED).

6.3.2.4 What to Look For When Monitoring XRMEM Cache

Database Working Set

Performance is maximized if the entire database working set fits into XRMEM cache. However, the XRMEM cache is smaller than Exadata Smart Flash Cache and it is likely that the database working set, or even the most frequently accessed portion of it, might not reside in XRMEM cache.

If the size of the working set increases you may observe more reads being satisfied by Exadata Smart Flash Cache or hard disk instead of XRMEM cache. This is magnified when access patterns on the data are more random. Consequently, you may observe increased read latencies, specifically in the cell single block physical read wait event.

Non-RDMA Reads

XRMEM cache is most effective when it is used in conjunction with RDMA. However, in various situations requests to cellsrv might still be satisfied using XRMEM cache. In such cases, you will see the associated reads as cell xrmem cache read hits rather than cell RDMA reads.

A non-RDMA read may occur for the following reasons:

  • Based on the type of data being read, Oracle Database may not be able to perform an RDMA read. Direct reads and reads from the Oracle Database control file are examples of this. To identify the cause you may be need to correlate with other statistics or wait events. For example, significant activity against the control files is usually accompanied by the control file sequential read wait event, or visible in the IOStat by Filetype section of the AWR report.
  • In the Oracle Database client, buffers may not be registered for RDMA. This normally occurs shortly after a client process starts or while XRMEM cache is populating.
  • Inside XRMEM cache, the RDMA hash table may not contain the required metadata (or it may be marked as invalid). This normally occurs while XRMEM cache is populating.
  • An RDMA read times out after exceeding the default lease time (20 ms).
  • Memory limitations prevent the creation of memory structures that are required for RDMA.
  • An RDMA read error occurs.

6.3.3 Monitoring PMEM Cache

Note:

This topic applies only to Oracle Exadata System Software releases before 23.1.0. Otherwise, see Monitoring XRMEM Cache.

Persistent Memory (PMEM) cache provides direct access to persistent memory on the storage servers using Remote Direct Memory Access (RDMA), enabling lower read latencies and faster response times. PMEM is available only in selected Exadata X8M and X9M storage server models. When database clients read from the PMEM cache, the client software performs an RDMA read of the cached data, which bypasses the storage server software and results in much lower read latencies.

PMEM cache works in conjunction with Exadata Smart Flash Cache. If available, data that is not in PMEM cache may be retrieved from Exadata Smart Flash Cache. Similarly, as data is written out of PMEM cache, it is written to Exadata Smart Flash Cache when using Write-Back mode.

Statistics from PMEM cache are slightly different when compared with other Exadata components. Because clients issue RDMA I/O directly to PMEM cache, the request does not go to cellsrv, so the storage server cannot account for the RDMA I/Os. For this reason, there are no cell metrics for PMEM cache I/O. Instead, Oracle Database statistics account for the I/O that is performed using RDMA.

Performance issues related to PMEM cache typically cause latency increases in the Oracle Database cell single block physical read wait events. However, bear in mind that Exadata Smart Flash Cache is still available to service the requests, and although requests from Exadata Smart Flash Cache generally experience higher read latencies compared to PMEM cache, the requests still benefit from the fast I/O provided by flash.

6.3.3.1 Monitoring PMEM Cache Using AWR

Note:

In Oracle Database versions that support Oracle Exadata System Software release 23.1.0 and later, all references to PMEM cache are renamed to XRMEM cache (Exadata RDMA Memory Cache). However, these parts of the AWR report still correspond to PMEM cache when the underlying Oracle Exadata Storage Server contains PMEM cache.

Automatic Workload Repository (AWR) contains information relating to PMEM cache. Following are descriptions and examples of the most commonly used sections in the AWR report that contain information about PMEM cache. By reviewing these sections of the AWR report, administrators can understand the operation of PMEM cache.

Note that because storage servers cannot account for the RDMA I/Os, the Oracle Database statistics are critical in understanding the use of PMEM cache. The PMEM cache statistics in the Exadata-specific sections of the AWR report only include I/Os that as serviced by cellsrv, which are not RDMA I/Os.

Apart from the sections described below, the AWR report also contains sections for PMEM Cache User Writes, and PMEM Cache Internal Reads. These sections are not described in detail because they relate to the use of PMEM cache in Write-Back mode, which is not generally recommended.

Database IOs

A summary of Database IOs may be found in the Single Block Reads section of the AWR report. This summary contains information about the effectiveness of PMEM cache. The Single Block Reads section is located in the AWR report under Exadata Statistics > Performance Summary.

In the following example, the overwhelming majority of all read I/O requests (approximately 88%) are serviced using RDMA reads to PMEM cache (cell RDMA reads), at a rate of more than 176,000 per second. Nearly all of the rest are satisfied using non-RDMA PMEM cache reads (cell xrmem cache read hits or cell pmem cache read hits), while the remaining reads are serviced by using Exadata Smart Flash Cache (cell flash cache read hits). Notice the massive difference in throughput for each different I/O type, which illustrates the power of PMEM cache and RDMA.

Figure 6-15 AWR Report: Database IOs

The image shows an example of the Database IOs summary in the Single Block Reads section in the AWR report.

PMEM Cache Configuration and Space Usage

The PMEM Cache Configuration section contains summary information including the caching mode (Write-Through or Write-Back) and overall size. The PMEM Cache Space Usage section provides summary statistics on space usage in PMEM cache.

The following example shows PMEM cache configured in Write-Through mode on all cells, with a total size of approximately 1509 GB.

Figure 6-16 AWR Report: PMEM Cache Configuration

The image shows an example of the PMEM Cache Configuration section in the AWR report.

The following example of the PMEM Cache Space Usage section shows 172 GB of PMEM cache spread evenly across 3 cells.

Figure 6-17 AWR Report: PMEM Cache Space Usage

The image shows an example of the PMEM Cache Space Usage section in the AWR report.

PMEM Cache User Reads

The PMEM Cache User Reads section shows information about read requests, read throughput, and read efficiency. Because storage servers cannot account for RDMA I/Os, the statistics only relate to non-RDMA reads processed by cellsrv. Hits are for I/Os satisfied using PMEM cache, while misses indicate that the data was not in PMEM cache.

Figure 6-18 AWR Report: PMEM Cache User Reads

The image shows an example of the PMEM Cache User Reads section in the AWR report.

PMEM Cache Internal Writes

The PMEM Cache Internal Writes section shows writes to PMEM cache that are performed by Oracle Exadata System Software. The internal writes are I/Os that populate PMEM cache.

Figure 6-19 AWR Report: PMEM Cache Internal Writes

The image shows an example of the PMEM Cache Internal Writes section in the AWR report.
6.3.3.2 Monitoring PMEM Cache Using Database Statistics and Wait Events

Note:

  • The availability of a specific statistic or wait event is subject to the version of Oracle Database being used.

  • In Oracle Database versions that support Oracle Exadata System Software release 23.1.0 and later, all database statistics and wait events relating to PMEM cache are renamed to XRMEM cache (Exadata RDMA Memory Cache). However, these database statistics and wait events still correspond to PMEM cache when the underlying Oracle Exadata Storage Server contains PMEM cache.

The following table describes database statistics that are useful for monitoring PMEM cache. The statistics are available in various dynamic performance views, including V$SYSSTAT, and may be displayed in the Global Activity Statistics or Instance Activity Statistics section of an AWR report.

Statistic Description
cell RDMA reads The number of successful PMEM cache read requests using RDMA
cell RDMA reads eligible The number of database reads that meet basic eligibility criteria for RDMA
cell RDMA read hash table probes The total number of RDMA hash table probes issued to determine the presence of data in PMEM cache. Each eligible read is usually associated with a hash table probe.
cell RDMA reads issued The total number of RDMA reads issued to retrieve data from PMEM cache. An RDMA read is usually issued after each successful hash table probe.

cell RDMA probe failures - hash table buffer allocation failed

cell RDMA probe failures - IPCDAT metadata allocation failed

cell RDMA probe failures - IPCDAT errors

The total number of RDMA hash table probes that failed. Each statistic covers a specific failure reason.

The sum of these failures accounts for most of the difference between cell RDMA read hash table probes and cell RDMA reads issued.

Some RDMA hash table probe failures are normal while PMEM cache is being initialized. However, frequent failures or a large number of failures may indicate a problem requiring further investigation.

cell RDMA read failures - lease expired

cell RDMA read failures - client registration errors

The total number of RDMA reads that failed. Each statistic covers a specific failure reason.

The sum of these failures accounts for most of the difference between cell RDMA reads issued and cell RDMA reads.

Normally, some RDMA read failures may occur. However, frequent failures or a large number of failures may indicate a problem requiring further investigation.

cell RDMA reads rejected - ineligible The number of database reads that failed basic eligibility criteria for RDMA
cell RDMA writes The number of successful PMEM cache write requests using RDMA

cell xrmem cache read hits

cell pmem cache read hits

The number of non-RDMA read requests processed by cellsrv resulting in a PMEM cache hit

The statistic is named cell xrmem cache read hits in Oracle Database versions that support Oracle Exadata System Software release 23.1.0 and later. In earlier versions, it is named cell pmem cache read hits.

cell xrmem cache writes

cell pmem cache writes

The number of non-RDMA write requests processed by cellsrv resulting in a PMEM cache write

The statistic is named cell xrmem cache writes in Oracle Database versions that support Oracle Exadata System Software release 23.1.0 and later. In earlier versions, it is named cell pmem cache writes.

The following table describes database wait events that are useful for monitoring PMEM cache. The wait events are visible in various dynamic performance views, including V$SESSION, V$SYSTEM_EVENT and V$SESSION_EVENT, and may be displayed in the Wait Event sections of the AWR report.

Wait Event Description
cell list of blocks physical read

This wait event occurs during recovery or during buffer pre-fetching (rather than performing multiple single-block reads). It is used to monitor database blocks that must be changed as part of recovery and are read in parallel for the database.

In V$SESSION, records associated with this event include additional parameters:

  • P1 contains the relevant storage server hash number, which corresponds to V$CELL.CELL_HASHVAL.

  • P2 identifies the disk hash number for the grid disk that contains the data, which corresponds to V$ASM_DISK.HASH_VALUE.

  • P3 specifies the number of bytes processed during the I/O read operation.

This wait event is equivalent to db file parallel read for a cell.

cell list of blocks read request

This is a placeholder wait event associated with cell list of blocks physical read, which is visible only during the wait period. After the wait event ends, the placeholder is typically converted to cell list of blocks physical read.

cell multiblock physical read

This wait event represents the time taken to perform all the I/Os for a multi-block database read.

In V$SESSION, records associated with this event include additional parameters:

  • P1 contains the relevant storage server hash number, which corresponds to V$CELL.CELL_HASHVAL.

  • P2 identifies the disk hash number for the grid disk that contains the data, which corresponds to V$ASM_DISK.HASH_VALUE.

  • P3 specifies the number of bytes processed during the I/O read operation.

This wait event is equivalent to db file scattered read for a cell.

cell multiblock read request

This is a placeholder wait event associated with cell multiblock physical read, which is visible only during the wait period. After the wait event ends, the placeholder is typically converted to cell multiblock physical read.

cell single block physical read

This wait event represents the time taken to perform a single block database I/O, equivalent to db file sequential read for a cell.

Commencing with the May 2022 Oracle Database release updates (versions 19.15.0.0.220419, 21.6.0.0.220419, and later), this wait event no longer includes I/O from Exadata Smart Flash Cache, I/O from PMEM cache, or database I/O using a Remote Direct Memory Access (RDMA) read. With this change, another wait event (cell single block physical read: xrmem cache or cell single block physical read: pmem cache) represents the time taken to perform a single block database I/O from PMEM cache.

Before the May 2022 Oracle Database release updates, effective use of PMEM cache coincides with extremely low latency for this wait event, which is typical for operations using RDMA.

In V$SESSION, records associated with this event include additional parameters:

  • P1 contains the relevant storage server hash number, which corresponds to V$CELL.CELL_HASHVAL.

  • P2 identifies the disk hash number for the grid disk that contains the data, which corresponds to V$ASM_DISK.HASH_VALUE.

  • P3 specifies the number of bytes processed during the I/O read operation.

cell single block physical read: xrmem cache

cell single block physical read: pmem cache

This wait event represents the time taken to perform a single block database I/O from PMEM cache.

Effective use of PMEM cache coincides with extremely low latency for this wait event, which is typical for operations using RDMA.

This wait event was introduced in the May 2022 Oracle Database release updates and is present in Oracle Database versions 19.15.0.0.220419, 21.6.0.0.220419, and later. Previously, cell single block physical read included these waits.

The wait event is named cell single block physical read: xrmem cache in Oracle Database versions that support Oracle Exadata System Software release 23.1.0 and later. In earlier versions, it is named cell single block physical read: pmem cache.

In V$SESSION, records associated with this event include additional parameters:

  • P1 contains the relevant storage server hash number, which corresponds to V$CELL.CELL_HASHVAL.

  • P2 identifies the disk hash number for the grid disk that contains the data (not the cache location), which corresponds to V$ASM_DISK.HASH_VALUE.

  • P3 specifies the number of bytes processed during the I/O read operation.

cell single block read request

This is a placeholder wait event associated with a single block database I/O that is visible only during the wait period. After the wait event ends, the placeholder is converted to the appropriate wait event, which is typically one of the cell single block physical read events.

cell interconnect retransmit during physical read

This wait event appears during retransmission for an I/O of a single-block or multiblock read. The cell hash number in the P1 column in the V$SESSION_WAIT view is the same cell identified for cell single block physical read and cell multiblock physical read. The P2 column contains the subnet number to the cell, and the P3 column contains the number of bytes processed during the I/O read operation.

6.3.3.3 Monitoring PMEM Cache Using Exadata Metrics

Exadata metrics that are related to PMEM cache are identified in the Exadata storage server METRICCURRENT, METRICDEFINITION, and METRICHISTORY objects as having objectType=PMEMCACHE.

Note that because reads from PMEM cache are primarily performed using RDMA calls from Oracle Database, there are no Exadata metrics that tally PMEM cache I/O.

Example 6-3 Displaying PMEM Cache Metric Definitions

This example shows how to display the PMEM cache metric definitions that are available in the Oracle Exadata System Software.

CellCLI> LIST METRICDEFINITION ATTRIBUTES NAME,DESCRIPTION WHERE OBJECTTYPE = PMEMCACHE
         PC_BY_ALLOCATED         "Number of megabytes allocated in PMEM cache"

Note the following additional details:

  • PC_BY_ALLOCATED represents the number of megabytes allocated in PMEM cache and tracks how many cachelines are used in the PMEM cache. If the value is close to the PMEM cache size, then the PMEM cache is fully populated.

    This metric is also available per database (DB_PC_BY_ALLOCATED), and per PDB (PDB_PC_BY_ALLOCATED).

6.3.3.4 What to Look For When Monitoring PMEM Cache

Database Working Set

Performance is maximized if the entire database working set fits into PMEM cache. However, the PMEM cache is much smaller than Exadata Smart Flash Cache and it is likely that the database working set, or even the most frequently accessed portion of it, might not reside in PMEM cache.

If the size of the working set increases you may observe more reads being satisfied by Exadata Smart Flash Cache or hard disk instead of PMEM cache. This is magnified when access patterns on the data are more random. Consequently, you may observe increased read latencies, specifically in the cell single block physical read wait event.

Non-RDMA Reads

PMEM cache is most effective when it is used in conjunction with RDMA. However, in various situations requests to cellsrv might still be satisfied using PMEM cache. In such cases, you will see the associated reads as cell xrmem cache read hits or cell pmem cache read hits rather than cell RDMA reads.

A non-RDMA read may occur for the following reasons:

  • Based on the type of data being read, Oracle Database may not be able to perform an RDMA read. Direct reads and reads from the Oracle Database control file are examples of this. To identify the cause you may be need to correlate with other statistics or wait events. For example, significant activity against the control files is usually accompanied by the control file sequential read wait event, or visible in the IOStat by Filetype section of the AWR report.
  • In the Oracle Database client, buffers may not be registered for RDMA. This normally occurs shortly after a client process starts or while PMEM cache is populating.
  • Inside PMEM cache, the RDMA hash table may not contain the required metadata (or it may be marked as invalid). This normally occurs while PMEM cache is populating.
  • An RDMA read times out after exceeding the default lease time (20 ms).
  • Memory limitations prevent the creation of memory structures that are required for RDMA.
  • An RDMA read error occurs.

6.3.4 Monitoring Exadata Smart Flash Log

Exadata Smart Flash Log reduces the average latency for performance-sensitive redo log write I/O operations, thereby eliminating performance bottlenecks that may occur due to slow redo log writes. Exadata Smart Flash Log removes latency spikes by simultaneously performing redo log writes to two media devices. The redo write is acknowledged as soon as the first write to either media device completes.

Originally, Exadata Smart Flash Log was used to perform simultaneous writes to disk and flash storage. However, Oracle Exadata System Software release 20.1 adds a further optimization, known as Smart Flash Log Write-Back, that uses Exadata Smart Flash Cache in Write-Back mode instead of disk storage.

6.3.4.1 Monitoring Exadata Smart Flash Log Using AWR

Following are descriptions and examples of the Flash Log Statistics sections in the AWR report. By reviewing these sections of the AWR report, administrators can understand the operation of Exadata Smart Flash Log.

Flash Log

The Flash Log section contains summary statistics about the operation of Exadata Smart Flash Log, including the number and of writes to Exadata Smart Flash Log, the number outliers prevented by Exadata Smart Flash Log, and the number of times when Exadata Smart Flash Log is skipped.

The following show an example of the Flash Log section. As shown in the example, when Exadata Smart Flash Log operates optimally, I/O is evenly distributed across all of the cells and there are no skips.

Figure 6-20 AWR Report: Flash Log

The image shows an example of the Flash Log section in the AWR report.

If the Skip Count is greater than zero, then the Flash Log Skip Details section contains the reasons for skipping Exadata Smart Flash Log.

Redo Write Histogram

The Redo Write Histogram section provides histograms that show the log file parallel write latency from the database, along with the redo write request completion latency from the storage servers. By comparing the database and cell latency histograms, you can determine if the high latency outliers are related to processing bottlenecks on the storage servers.

Figure 6-21 AWR Report: Redo Write Histogram

The image shows an example of the Redo Write Histogram section in the AWR report.
6.3.4.2 Monitoring Exadata Smart Flash Log Using Database Statistics and Wait Events

The following table describes various database statistics that are useful for monitoring redo write and Exadata Smart Flash Log performance. The statistics are available in various dynamic performance views, including V$SYSSTAT, and may be displayed in the Global Activity Statistics or Instance Activity Statistics section of an AWR report.

Statistic Description

redo write size count (4KB)

redo write size count (8KB)

redo write size count (16KB)

redo write size count (32KB)

redo write size count (64KB)

redo write size count (128KB)

redo write size count (256KB)

redo write size count (512KB)

redo write size count (1024KB)

redo write size count (inf)

Number of redo writes where the redo write size is smaller than the size indicated in parenthesis. For example, redo write size count (4KB) includes redo writes smaller than 4 KB in size, and redo write size count (inf) includes redo writes over 1 MB in size.

redo writes

Total number of writes to the redo log files by the Oracle Database log writer (LGWR) process(es).

The following table describes database wait events that are useful for monitoring redo write and Exadata Smart Flash Log performance. The wait events are visible in various dynamic performance views, including V$SESSION, V$SYSTEM_EVENT and V$SESSION_EVENT, and may be displayed in the Wait Event sections of the AWR report.

Wait Event Description
log file parallel write

The Oracle Database log writer (LGWR) process waits on this event when it is waiting for the completion of writes to the redo log file.

Efficient use of Exadata Smart Flash Log is indicated by comparatively lower time waiting on this event.

The availability of a specific statistic or wait event is subject to the version of Oracle Database being used.

6.3.4.3 Monitoring Exadata Smart Flash Log Using Exadata Metrics

Exadata metrics that are related to Exadata Smart Flash Log provide information about flash log utilization, such as the number of megabytes written per second. Flash Log metrics are identified in the METRICCURRENT, METRICDEFINITION, and METRICHISTORY objects as having objectType=FLASHLOG.

Example 6-4 Displaying Flash Log Metric Definitions

This example shows how to display the flash log metric definitions that are available in the Oracle Exadata System Software.

CellCLI> LIST METRICDEFINITION ATTRIBUTES NAME,DESCRIPTION WHERE OBJECTTYPE = FLASHLOG
         FL_ACTUAL_OUTLIERS              "The number of times redo writes to flash and disk both exceeded the outlier threshold"
         FL_BY_KEEP                      "The amount of  redo data saved on flash due to disk I/O errors"
         FL_DISK_FIRST                   "The number of times redo writes first completed to disk"
         FL_DISK_IO_ERRS                 "The number of disk I/O errors encountered by Smart Flash Logging"
         FL_EFFICIENCY_PERCENTAGE        "The efficiency of Smart Flash Logging expressed as a percentage"
         FL_EFFICIENCY_PERCENTAGE_HOUR   "The efficiency of Smart Flash Logging over the last hour expressed as a percentage"
         FL_FLASH_FIRST                  "The number of times redo writes first completed to flash"
         FL_FLASH_IO_ERRS                "The number of flash I/O errors encountered by Smart Flash Logging"
         FL_FLASH_ONLY_OUTLIERS          "The number of times redo writes to flash exceeded the outlier threshold"
         FL_IO_DB_BY_W                   "The number of MB written to hard disk by Smart Flash Logging"
         FL_IO_DB_BY_W_SEC               "The rate which is the number of MB per second written to hard disk by Smart Flash Logging"
         FL_IO_FL_BY_W                   "The number of MB written to flash by Smart Flash Logging"
         FL_IO_FL_BY_W_SEC               "The rate which is the number of MB per second written to flash by Smart Flash Logging"
         FL_IO_TM_W                      "Cumulative latency of all redo log writes"
         FL_IO_TM_W_RQ                   "Average latency of all redo log writes"
         FL_IO_W                         "The number of writes serviced by Smart Flash Logging"
         FL_IO_W_SKIP_BUSY               "The number of redo writes that could not be serviced by Smart Flash Logging because too much data had not yet been written to disk"
         FL_IO_W_SKIP_BUSY_MIN           "The number of redo writes during the last minute that could not be serviced by Smart Flash Logging because too much data had not yet been written to disk"
         FL_IO_W_SKIP_DISABLED_GD        "The number of redo writes that could not be serviced by Smart Flash Logging because it was disabled for the redo log's grid disk"
         FL_IO_W_SKIP_IORM_LIMIT         "The number of redo writes that could not be serviced by Smart Flash Logging because the IORM limit had been reached for the redo log's grid disk"
         FL_IO_W_SKIP_IORM_PLAN          "The number of redo writes that could not be serviced by Smart Flash Logging because it was disabled by the IORM plan"
         FL_IO_W_SKIP_LARGE              "The number of large redo writes that could not be serviced by Smart Flash Logging because the size of the data was larger than the amount of available space on any flash disk"
         FL_IO_W_SKIP_LOG_ON_FLASH       "The number of redo writes that could not be serviced by Smart Flash Logging because the redo log resided on flash"
         FL_IO_W_SKIP_NO_BUFFER          "The number of redo writes that could not be serviced by Smart Flash Logging because of lack of available buffers"
         FL_IO_W_SKIP_NO_FL_DISKS        "The number of redo writes that could not be serviced by Smart Flash Logging because there were no available Flash Log disks"
         FL_PREVENTED_OUTLIERS           "The number of times redo writes to disk exceeded the outlier threshold; these would have been outliers had it not been for Smart Flash Logging"
         FL_RQ_TM_W                      "Cumulative latency of all redo log write requests (includes network and other processing overhead)"
         FL_RQ_TM_W_RQ                   "Average latency of all redo log write requests"
         FL_RQ_W                         "The number of redo log write requests serviced (includes requests which were not handled by Smart Flash Logging)"
         FL_SKIP_OUTLIERS                "The number of times redo writes to disk exceeded the outlier threshold when Smart Flash Logging was not used"

Note the following additional details:

  • FL_RQ_TM_W tracks the cumulative redo log write request latency, which includes networking and other overhead. To determine the overhead component, get the latency overhead due to factors such as network and processing, you can use subtract FL_IO_TM_W from FL_RQ_TM_W.

  • FL_SKIP_OUTLIERS tracks the number of outliers when redo log write skips use of the flash log. An outlier is a redo log write that exceeds 0.5 seconds.

    The use of the flash log can be skipped because of the reasons associated with FL_IO_W_SKIP_DISABLED_GD, FL_IO_W_SKIP_IORM_LIMIT, FL_IO_W_SKIP_IORM_PLAN, FL_IO_W_SKIP_LOG_ON_FLASH, or FL_IO_W_SKIP_NO_FL_DISKS.

  • For simultaneous writes to disk and flash storage, the disk controller write cache can absorb some writes quicker than flash. Consequently, it is normal for a significant proportion of redo log write operations to complete to disk before flash. In some cases, the value of FL_DISK_FIRST can exceed the value of FL_FLASH_FIRST. However, this does not mean that the Exadata Smart Flash Logging feature is ineffective or unnecessary.
6.3.4.4 What to Look For When Monitoring Exadata Smart Flash Log

General Performance

Performance issues related to redo logging typically exhibit high latency for the log file sync wait event in the Oracle Database user foreground processes, with corresponding high latency for log file parallel write in the Oracle Database log writer (LGWR) process. Because of the performance-critical nature of redo log writes, occasional long latencies for log file parallel write may cause fluctuations in database performance, even if the average log file parallel write wait time is acceptable.

If any of these are occurring, then it may be indicative of an issue with Exadata Smart Flash Log performance.

Redo Write Histograms

The log file parallel write wait event indicates the amount of time that the database waits on a redo log write. The log file parallel write histogram shows the number of times the redo write completed within a specified time range. Similarly, the redo log write completions statistic indicates the amount of time that the storage server spends processing redo write requests, and the redo log write completions histogram shows the number of times the redo write request was completed within a specified time range. Both histograms are shown in the Redo Write Histogram section of the AWR report.

A histogram with a significant number of occasional long latencies is said to have a long tail. When both of the histograms in the Redo Write Histogram section of the AWR report have long tails, then this is an indication of slow write times on the storage server, which would warrant further investigation of the other I/O performance statistics. See Monitoring Cell Disk I/O.

If the log file parallel write histogram has a long tail that is not present in the redo log write completions histogram, then the cause is generally not in the storage server, but rather something else in the I/O path, such as bottlenecks in the network or contention for compute node CPU.

Skipping

Increased redo write latencies can also occur when the Exadata Smart Flash Log is skipped, and the redo write goes only to disk. Both the AWR report and the storage server metrics show the number of redo log writes that skipped Exadata Smart Flash Log. Skipping may occur when Exadata Smart Flash Log contains too much data that has not yet been written to disk.

There are a few factors that can cause redo writes to skip Exadata Smart Flash Log:

  • Flash disks with high write latencies.

    This can be observed in various IO Latency tables located in the Exadata Resource Statistics section of the AWR report, and in the Exadata CELLDISK metrics. This can also be identified by checking the FL_FLASH_ONLY_OUTLIERS metric. If the metric value is high, this indicates a flash disk performance issue.

  • Hard disks with high latencies or high utilization.

    Prior to Oracle Exadata System Software release 20.1, redo log writes are written to both Exadata Smart Flash Log and hard disk. If the hard disks experience high latencies or high utilization, redo log write performance can be impacted.

    This can be observed in various IO Latency tables located in the Exadata Resource Statistics section of the AWR report, and in the Exadata CELLDISK metrics. This can also be identified by checking the Outliers columns in the Flash Log section of the AWR report, or the FL_PREVENTED_OUTLIERS storage server metric. A large number of prevented outliers may indicate that the hard disk writes are taking a long time.

    In this case, although Exadata Smart Flash Log prevents outliers, overall throughput may be limited due to the queue of redo log data that must be written to disk.

    Oracle Exadata System Software release 20.1 adds a further optimization, known as Smart Flash Log Write-Back, that uses Exadata Smart Flash Cache in Write-Back mode instead of disk storage, thereby eliminating the hard disks as a potential performance bottleneck. Depending on the system workload, this feature can improve overall log write throughput by up to 250%.

6.3.5 Monitoring XRMEM Log

Note:

Oracle Exadata System Software release 23.1.0 introduces Exadata RDMA Memory (XRMEM). XRMEM represents the collection of Exadata software technologies that provide direct access to storage server memory using Remote Direct Memory Access (RDMA), enabling faster response times and lower read latencies. In this release, the persistent memory commit accelerator, previously known as PMEM log, is now called XRMEM Log.

Redo log writes are critical database operations and need to complete in a timely manner to prevent load spikes or stalls. Exadata Smart Flash Log is designed to prevent redo write latency outliers. XRMEM log helps to further reduce redo log write latency by using Persistent Memory (PMEM) and Remote Direct Memory Access (RDMA). XRMEM Log is available only in selected Exadata X8M and X9M storage server models.

With XRMEM Log, database clients send I/O buffers directly to PMEM on the storage servers using RDMA, thereby reducing transport latency. The cell server (cellsrv) then writes the redo to Exadata Smart Flash Log (if enabled) and disk at a later time.

Reduced redo log write latency improves OLTP performance, resulting in higher transaction throughput. In cases where XRMEM log is bypassed, Exadata Smart Flash Log can still be used.

6.3.5.1 Monitoring XRMEM Log Using Database Statistics and Wait Events

The following table describes database statistics that are useful for monitoring XRMEM log performance. The statistics are available in various dynamic performance views, including V$SYSSTAT, and may be displayed in the Global Activity Statistics or Instance Activity Statistics section of an AWR report.

Statistic Description

cell xrmem log writes

cell pmem log writes

The number of redo log write requests that used XRMEM log

The statistic is named cell xrmem log writes in Oracle Database versions that support Oracle Exadata System Software release 23.1.0 and later. In earlier versions, it is named cell pmem log writes.

The following table describes database wait events that are useful for monitoring XRMEM log performance. The wait events are visible in various dynamic performance views, including V$SESSION, V$SYSTEM_EVENT and V$SESSION_EVENT, and may be displayed in the Wait Event sections of the AWR report.

Wait Event Description
log file parallel write

The Oracle Database log writer (LGWR) process waits on this event when it is waiting for the completion of writes to the redo log file.

Efficient use of XRMEM log is indicated by comparatively lower time waiting on this event.

The availability of a specific statistic or wait event is subject to the version of Oracle Database being used.

6.3.5.2 What to Look For When Monitoring XRMEM Log

General Performance

Performance issues related to redo logging typically exhibit high latency for the log file sync wait event in the Oracle Database user and foreground processes, with corresponding high latency for log file parallel write in the Oracle Database log writer (LGWR) process. Because of the performance-critical nature of redo log writes, occasional long latencies for log file parallel write may cause fluctuations in database performance, even if the average log file parallel write wait time is acceptable.

If any of these are occurring, then it may be indicative of an issue with XRMEM log performance.

Bypassing XRMEM Log

Increased redo write latencies when using XRMEM log can occur when XRMEM log is bypassed. When XRMEM log is bypassed, the request is sent to the cellsrv, and Exadata Smart Flash Log is still used (if available). However, when the bypass request is sent, it has to ensure that there is no conflict with a previous XRMEM log request. This conflict checking, which requires scanning XRMEM log, makes bypass writes more expensive to process, and can result in higher than expected redo log write latencies.

There are several possible causes that result in a small number of XRMEM log bypasses. Under normal circumstances, the number of bypasses should be substantially less than 1% of the total number of XRMEM log requests. A high number of XRMEM log bypasses is likely to be a symptom of another problem, such as congestion on the RoCE Network Fabric.

6.3.6 Monitoring PMEM Log

Note:

This topic applies only to Oracle Exadata System Software releases before 23.1.0. Otherwise, see Monitoring XRMEM Log.

Redo log writes are critical database operations and need to complete in a timely manner to prevent load spikes or stalls. Exadata Smart Flash Log is designed to prevent redo write latency outliers. PMEM Log helps to further reduce redo log write latency by using Persistent Memory (PMEM) and Remote Direct Memory Access (RDMA). PMEM Log is available only in selected Exadata X8M and X9M storage server models.

With PMEM Log, database clients send I/O buffers directly to PMEM on the storage servers using RDMA, thereby reducing transport latency. The cell server (cellsrv) then writes the redo to Exadata Smart Flash Log (if enabled) and disk at a later time.

Reduced redo log write latency improves OLTP performance, resulting in higher transaction throughput. In cases where PMEM Log is bypassed, Exadata Smart Flash Log can still be used.

6.3.6.1 Monitoring PMEM Log Using Database Statistics and Wait Events

The following table describes database statistics that are useful for monitoring PMEM Log performance. The statistics are available in various dynamic performance views, including V$SYSSTAT, and may be displayed in the Global Activity Statistics or Instance Activity Statistics section of an AWR report.

Statistic Description

cell xrmem log writes

cell pmem log writes

The number of redo log write requests that used PMEM log

The statistic is named cell xrmem log writes in Oracle Database versions that support Oracle Exadata System Software release 23.1.0 and later. In earlier versions, it is named cell pmem log writes.

The following table describes database wait events that are useful for monitoring PMEM Log performance. The wait events are visible in various dynamic performance views, including V$SESSION, V$SYSTEM_EVENT and V$SESSION_EVENT, and may be displayed in the Wait Event sections of the AWR report.

Wait Event Description
log file parallel write

The Oracle Database log writer (LGWR) process waits on this event when it is waiting for the completion of writes to the redo log file.

Efficient use of PMEM Log is indicated by comparatively lower time waiting on this event.

The availability of a specific statistic or wait event is subject to the version of Oracle Database being used.

6.3.6.2 What to Look For When Monitoring PMEM Log

General Performance

Performance issues related to redo logging typically exhibit high latency for the log file sync wait event in the Oracle Database user and foreground processes, with corresponding high latency for log file parallel write in the Oracle Database log writer (LGWR) process. Because of the performance-critical nature of redo log writes, occasional long latencies for log file parallel write may cause fluctuations in database performance, even if the average log file parallel write wait time is acceptable.

If any of these are occurring, then it may be indicative of an issue with PMEM Log performance.

Bypassing PMEM Log

Increased redo write latencies when using PMEM Log can occur when PMEM Log is bypassed. When PMEM Log is bypassed, the request is sent to the cellsrv, and Exadata Smart Flash Log is still used (if available). However, when the bypass request is sent, it has to ensure that there is no conflict with a previous PMEM Log request. This conflict checking, which requires scanning PMEM Log, makes bypass writes more expensive to process, and can result in higher than expected redo log write latencies.

There are several possible causes that result in a small number of PMEM Log bypasses. Under normal circumstances, the number of bypasses should be substantially less than 1% of the total number of PMEM Log requests. A high number of PMEM Log bypasses is likely to be a symptom of another problem, such as congestion on the RoCE Network Fabric.

6.3.7 Monitoring Smart I/O

Exadata Smart Scan offloads data search and retrieval processing to the Exadata storage servers. It is able to evaluate database predicates on the storage server to optimize efficiency and performance, especially for large queries and certain classes of bulk data processing.

Smart Scan also uses storage indexes, which are maintained automatically on the storage servers, to further optimize filtering by eliminating unnecessary I/O.

Columnar cache is another smart I/O optimization. The columnar cache is a section of Exadata Smart Flash Cache that stores data in columnar format. The columnar cache is automatically populated and maintained by the storage server when a Smart Scan is performed.

6.3.7.1 Monitoring Smart I/O Using AWR

The Smart Scan Summary is part of the Performance Summary section of the AWR report. This area contains a high-level summary of key statistics associated with smart I/O. The following example highlights a situation where some activity reverted to block I/O because of ongoing online encryption.

Figure 6-22 AWR Report: Performance Summary - Smart Scan Summary

The image shows an example of the Performance Summary - Smart Scan Summary section in the AWR report.

The Smart IO Statistics section of the AWR report contains more detailed summarized statistical information for smart I/O operations on the whole system and on each storage server. In particular, it tracks the amount of I/O that is eligible for optimization using Smart Scan, the savings from using storage indexes and columnar cache, the amount of data in Exadata Smart Flash Cache, and the amount of data filtered by the storage servers without transportation back to the database server. It also shows associated database statistics and more detailed information about situations where the storage servers are unable to perform predicate evaluation (also known as 'passthru').

Figure 6-23 AWR Report: Smart IO Statistics

The image shows an example of the Smart IO Statistics section in the AWR report.
6.3.7.2 Monitoring Smart I/O Using Database Statistics and Wait Events

The following table describes various database statistics that are useful for monitoring smart I/O operations. The statistics are available in various dynamic performance views, including V$SYSSTAT, and may be displayed in the Global Activity Statistics or Instance Activity Statistics section of an AWR report.

Statistic Description
cell IO uncompressed bytes

The total size of uncompressed data that is processed on the cell.

For operations on segments compressed using Exadata Hybrid Columnar Compression, this statistic is the size of the data after decompression.

cell num bytes in block IO during predicate offload The number of bytes that were not offloaded because the client was in block I/O mode.
cell num bytes in filter passthru due to low mem The number of bytes that were not offloaded and sent back to the database for processing due to a low memory on the cell.
cell num bytes in filter passthru due to subheap size limit exc The number of bytes that were not offloaded and sent back to the database for processing due to a memory limit on the cell.
cell num bytes in passthru due to quarantine The number of bytes that were not offloaded and sent back to the database for processing due to a quarantine on the cell.
cell num bytes in passthru during predicate offload The number of bytes that were not offloaded and sent back to the database for processing.
cell num smart IO sessions in rdbms block IO due to big payload The number of sessions in block I/O mode (not offloaded) due to excessively large metadata.
cell num smart IO sessions in rdbms block IO due to no cell mem The number of sessions in block I/O mode (not offloaded) due to memory shortage on the storage servers.
cell num smart IO sessions in rdbms block IO due to online encr The number of sessions in block I/O mode (not offloaded) due to an ongoing online encryption operation.

cell num smart IO sessions in rdbms block IO due to open fail

The number of sessions in block I/O mode (not offloaded) due to a failure in opening a connection to a cell.

cell num smart IO sessions in rdbms block IO due to user

The number of sessions in block I/O mode (not offloaded) due to a user setting.
cell num smart IO sessions using passthru mode due to cellsrv The number of sessions in passthru mode (not offloaded) due to an issue with CELLSRV.

cell num smart IO sessions using passthru mode due to timezone

The number of sessions in passthru mode (not offloaded) due to an ongoing database timezone upgrade operation.
cell num smart IO sessions using passthru mode due to user The number of sessions in passthru mode (not offloaded) due to a user setting.
cell physical IO bytes added to storage index The number of bytes added to the storage index during a Smart Scan. This is an indication that the storage index is being built.
cell physical IO bytes eligible for predicate offload The number of bytes on-disk eligible for predicate offload.
cell physical IO bytes eligible for smart IOs

The number of actual bytes eligible for predicate offload.

For example, when using columnar cache, this is the size of columnar cache instead of the on-disk size.

cell physical IO bytes processed for IM capacity The number of bytes read from the columnar cache in memcompress for capacity format.
cell physical IO bytes processed for IM query The number of bytes read from the columnar cache in memcompress for query format.
cell physical IO bytes processed for no memcompress The number of bytes read from the columnar cache in no memcompress format.
cell physical IO bytes saved by columnar cache The number of bytes saved by columnar cache; that is, the number of bytes of reading that was avoided.
cell physical IO bytes saved by storage index The number of bytes saved by storage index.
cell physical IO bytes saved during optimized file creation The number of I/O bytes saved by the database host by offloading the file creation operation to the cells. This statistic shows the benefit of optimized file creation operations.
cell physical IO bytes saved during optimized RMAN restore The number of I/O bytes saved by the database host by offloading the RMAN file restore operation to the cells. This statistic shows the benefit of optimized RMAN file restore operations.
cell physical IO bytes sent directly to DB node to balance CPU usage The number of I/O bytes sent back to the database server for processing due to high storage server CPU usage.
cell physical IO interconnect bytes The number of I/O bytes exchanged over the interconnect between the database host and the cells.
cell physical IO interconnect bytes returned by smart scan The number of I/O bytes that are returned by the cell for Smart Scan operations. It does not include bytes for other database I/O.

The following table describes database wait events that are useful for monitoring smart I/O operations. The wait events are visible in various dynamic performance views, including V$SESSION, V$SYSTEM_EVENT and V$SESSION_EVENT, and may be displayed in the Wait Event sections of the AWR report.

Wait Event Description
cell external table smart scan

This wait event appears when the database is waiting for an external table scan on a cell.

The cell hash number in the P1 column in V$SESSION can help to identify a slow cell compared to the rest of the cells.

cell smart file creation

This wait event appears when the database is waiting for the completion of a file creation on a cell.

The cell hash number in the P1 column in V$SESSION can help to identify a slow cell compared to the rest of the cells.

cell smart incremental backup

This wait event appears when the database is waiting for the completion of an incremental backup on a cell.

The cell hash number in the P1 column in V$SESSION can help to identify a slow cell compared to the rest of the cells.

cell smart index scan

This wait event appears when the database is waiting for an index fast full scan.

The cell hash number in the P1 column in V$SESSION can help to identify a slow cell when compared to the rest of the cells.

cell smart index scan: db timezone upgrade The wait event appears when the cells are unable to offload because a database timezone upgrade is in progress.
cell smart index scan: disabled by user The wait event appears when the cells are unable to offload due to a user setting.
cell smart index scan: pass through The wait event appears when the cells are unable to offload the Smart Scan.
cell smart index scan request

This is a placeholder wait event associated with cell smart index scan, which is only visible during the wait period.

After the wait event ends, the placeholder is typically converted to cell smart index scan. However, to better describe the wait outcome, the placeholder may be converted to cell smart index scan: db timezone upgrade, cell smart index scan: disabled by user, or cell smart index scan: pass through.

cell smart restore from backup

This wait event appears when the database is waiting for the completion of a file initialization for restore from backup on a cell.

The cell hash number in the P1 column in V$SESSION can help to identify a slow cell when compared to the rest of the cells.

cell smart table scan

This wait event appears when the database is waiting for smart scans to complete on a cell.

The cell hash number in the P1 column in V$SESSION can help to identify a slow cell when compared to the rest of the cells.

cell smart table scan: db timezone upgrade The wait event appears when the cells are unable to offload because a database timezone upgrade is in progress.
cell smart table scan: disabled by user The wait event appears when the cells are unable to offload due to a user setting.
cell smart table scan: pass through The wait event appears when the cells are unable to offload the Smart Scan.
cell smart table scan request

This is a placeholder wait event associated with cell smart table scan, which is only visible during the wait period.

After the wait event ends, the placeholder is typically converted to cell smart table scan. However, to better describe the wait outcome, the placeholder may be converted to cell smart table scan: db timezone upgrade, cell smart table scan: disabled by user, or cell smart table scan: pass through.

The availability of a specific statistic or wait event is subject to the version of Oracle Database being used.

6.3.7.3 Monitoring Smart I/O Using SQL Monitor

In addition to dynamic performance views and wait events, Oracle Database provides SQL monitor, which enables you to monitor the execution of individual SQL statements.

The SQL monitor report includes detailed statistics for the row sources, which includes additional information for smart I/O operations. The row source statistics can be viewed in the Enterprise Manager active SQL monitor report by clicking on the binoculars in the row source. For example:

Figure 6-24 SQL Monitor Report: Smart I/O Row Source Statistics

The image shows an example of smart I/O row source statistics provided by SQL monitor.

The following table describes various row source statistics provided by SQL monitor, which are useful for monitoring smart I/O.

Statistic Description
Filtered bytes The number of bytes returned by the cell.
Cell passthru IO bytes The number of bytes that are not offloaded and sent back to the database for processing.
Cell passthru IO bytes due to quarantine The number of bytes that are not offloaded and sent back to the database for processing due to a quarantine on the cell.
Eligible bytes for smart IO

The number of actual bytes eligible for predicate offload.

For example, when using columnar cache, this is the size of columnar cache instead of the on-disk size.

SI saved bytes The number of bytes saved by storage index; that is, the number of bytes that did not have to be read.
Columnar cache saved bytes The number of bytes saved by columnar cache; that is, the number of bytes that did not have to be read.
Partial flash cache and disk bytes The number of bytes that read from both Exadata Smart Flash Cache and disk.
Flash cache bytes The number of bytes read from Exadata Smart Flash Cache.
IM Capacity bytes The number of bytes read from the columnar cache in memcompress for capacity format.
IM Query bytes The number of bytes read from the columnar cache in memcompress for query format.
No memcompress bytes The number of bytes read from the columnar cache in no memcompress format.
Bytes added to storage index The number of bytes added to the storage index during a Smart Scan. This is an indication that the storage index is being built.
cell IORM IO requests on flash The number of physical I/O requests to flash storage.
cell IORM wait time on flash (us)

The amount of time (in microseconds) IORM queued the flash request.

The cell IORM wait time on flash (us) / cell IORM IO requests on flash gives an indication of how much time, on average, is spent in the IORM queue.

cell IORM IO requests on disk The number of physical I/O requests to disk storage.
cell IORM wait time on disk (us)

The amount of time (in microseconds) IORM queued the disk request.

The cell IORM wait time on disk (us) / cell IORM IO requests on disk gives an indication of how much time, on average, is spent in the IORM queue.

Block IO bytes The number of bytes in block I/O mode.

Slow metadata bytes

Metadata bytes

The size of the query metadata sent from the database compute node to a cell.
Eligible bytes The number of bytes on-disk eligible for predicate offload.

The availability of a specific statistic is subject to the version of Oracle Database being used.

6.3.7.4 Monitoring Smart I/O Using SQL Explain Plan

The SQL EXPLAIN PLAN command displays information about smart I/O optimizations in the SQL execution plan.

You can use the EXPLAIN PLAN command to identify parts of a SQL query that can be offloaded to the storage server. The database parameter CELL_OFFLOAD_PLAN_DISPLAY must be set to AUTO or ALWAYS for the EXPLAIN PLAN command to display the smart I/O optimizations in the SQL execution plan.

Example 6-5 Monitoring Smart I/O Using SQL Explain Plan

This example shows how to use the EXPLAIN PLAN command to display the smart I/O optimizations in the SQL execution plan.

In the query plan, the TABLE ACCESS STORAGE FULL operation indicates that the corresponding full table scan is offloaded to the storage server. The predicate information further describes the query predicates that are offloaded to the storage server.

SQL> ALTER SESSION SET CELL_OFFLOAD_PLAN_DISPLAY = ALWAYS; 

Session altered.

SQL> EXPLAIN PLAN FOR
  SELECT t.prod_id, v.exp1, t2_prod_id, t2_amount_sold
  FROM   sales t, v1 v
  WHERE  t.prod_id = v.prod_id AND t.cust_id = v.cust_id
    AND  t.prod_id != 45
    AND  v.amount_sold * v.quantity_sold > 10000;

Explained.

SQL> SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------
Plan hash value: 2267424675

--------------------------------------------------
| Id  | Operation                   | Name       |
--------------------------------------------------
|   0 | SELECT STATEMENT            |            |
|*  1 |  HASH JOIN                  |            |
|*  2 |   HASH JOIN                 |            |
|*  3 |    TABLE ACCESS STORAGE FULL| SALES      |
|*  4 |    TABLE ACCESS STORAGE FULL| SALES      |
|*  5 |   TABLE ACCESS STORAGE FULL | SALES      |
--------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T"."CUST_ID"="T2"."CUST_ID" AND
              "T1"."PROD_ID"="T2"."PROD_ID" AND "T1"."CUST_ID"="T2"."CUST_ID")
   2 - access("T"."PROD_ID"="T1"."PROD_ID")
   3 - storage("T1"."PROD_ID"<200 AND
              "T1"."AMOUNT_SOLD"*"T1"."QUANTITY_SOLD">10000 AND "T1"."PROD_ID"<>45)
       filter("T1"."PROD_ID"<200 AND
              "T1"."AMOUNT_SOLD"*"T1"."QUANTITY_SOLD">10000 AND "T1"."PROD_ID"<>45)
   4 - storage("T"."PROD_ID"<200 AND "T"."PROD_ID"<>45)
       filter("T"."PROD_ID"<200 AND "T"."PROD_ID"<>45)
   5 - storage("T2"."PROD_ID"<200 AND "T2"."PROD_ID"<>45)
       filter("T2"."PROD_ID"<200 AND "T2"."PROD_ID"<>45)
6.3.7.5 Monitoring Smart I/O Using Exadata Metrics

Smart I/O metrics are identified in the METRICCURRENT, METRICDEFINITION, and METRICHISTORY objects as having objectType=SMARTIO.

For cumulative metrics, the metric value for a specific time period can be determined by subtracting values from different collectionTime periods.

Example 6-6 Displaying Smart I/O Metric Definitions

This example shows how to display the smart I/O metric definitions that are available in the Oracle Exadata System Software.

CellCLI> LIST METRICDEFINITION ATTRIBUTES NAME,DESCRIPTION WHERE OBJECTTYPE = SMARTIO
         SIO_IO_EL_OF            "Cumulative number of megabytes eligible for smart IO offload"
         SIO_IO_EL_OF_SEC        "Number of megabytes per second eligible for smart IO offload"
         SIO_IO_OF_RE            "Cumulative number of interconnect megabytes returned by smart IO"
         SIO_IO_OF_RE_SEC        "Number of interconnect megabytes per second returned by smart IO"
         SIO_IO_PA_TH            "Cumulative number of megabytes of passthru IOs by smart IO"
         SIO_IO_PA_TH_SEC        "Number of megabytes per second of passthru IOs by smart IO"
         SIO_IO_RD_FC            "Cumulative number of megabytes read from flash cache by smart IO"
         SIO_IO_RD_FC_HD         "Cumulative number of megabytes read from both flash cache and hard disk by smart IO"
         SIO_IO_RD_FC_HD_SEC     "Number of megabytes per second read from both flash cache and hard disk by smart IO"
         SIO_IO_RD_FC_SEC        "Number of megabytes per second read from flash cache by smart IO"
         SIO_IO_RD_HD            "Cumulative number of megabytes read from hard disk by smart IO"
         SIO_IO_RD_HD_SEC        "Number of megabytes per second read from hard disk by smart IO"
         SIO_IO_RD_RQ_FC         "Cumulative number of read IO requests from flash cache by smart IO"
         SIO_IO_RD_RQ_FC_HD      "Cumulative number of read IO requests from both flash cache and hard disk by smart IO"
         SIO_IO_RD_RQ_FC_HD_SEC  "Number of read IO requests per second from both flash cache and hard disk by smart IO"
         SIO_IO_RD_RQ_FC_SEC     "Number of read IO requests per second from flash cache by smart IO"
         SIO_IO_RD_RQ_HD         "Cumulative number of read IO requests from hard disk by smart IO"
         SIO_IO_RD_RQ_HD_SEC     "Number of read IO requests per second from hard disk by smart IO"
         SIO_IO_RV_OF            "Cumulative number of megabytes sent to DB node to balance CPU by smart IO"
         SIO_IO_RV_OF_SEC        "Number of megabytes per second sent to DB node to balance CPU by smart IO"
         SIO_IO_SI_SV            "Cumulative number of megabytes saved by storage index"
         SIO_IO_SI_SV_SEC        "Number of megabytes per second saved by storage index"
         SIO_IO_WR_FC            "Cumulative number of megabytes of flash cache population writes by smart IO"
         SIO_IO_WR_FC_SEC        "Number of megabytes per second of flash cache population writes by smart IO"
         SIO_IO_WR_HD            "Cumulative number of megabytes written to hard disk by smart IO"
         SIO_IO_WR_HD_SEC        "Number of megabytes per second written to hard disk by smart IO"
         SIO_IO_WR_RQ_FC         "Cumulative number of IO requests for flash cache population writes by smart IO"
         SIO_IO_WR_RQ_FC_SEC     "Number of IO requests per second for flash cache population writes by smart IO"
         SIO_IO_WR_RQ_HD         "Cumulative number of write IO requests to hard disk by smart IO"
         SIO_IO_WR_RQ_HD_SEC     "Number of write IO requests per second to hard disk by smart IO"

Columnar cache metrics are identified as having name like 'FC_COL.*'

Example 6-7 Displaying Columnar Cache Metric Definitions

This example shows how to display the columnar cache metric definitions that are available in the Oracle Exadata System Software.

CellCLI> LIST METRICDEFINITION ATTRIBUTES NAME,DESCRIPTION WHERE NAME LIKE 'FC_COL.*'
         FC_COL_BYKEEP_USED           "Number of megabytes used for keep objects in Columnar FlashCache"
         FC_COL_BY_USED               "Number of megabytes used in Columnar FlashCache"
         FC_COL_IO_BYKEEP_R           "Number of megabytes read from Columnar FlashCache for keep objects"
         FC_COL_IO_BYKEEP_R_SEC       "Number of megabytes read per second from Columnar FlashCache for keep objects"
         FC_COL_IO_BY_R               "Number of megabytes that were read from Columnar FlashCache"
         FC_COL_IO_BY_R_ELIGIBLE      "Number of megabytes eligible to read from Columnar FlashCache"
         FC_COL_IO_BY_R_ELIGIBLE_SEC  "Number of megabytes per second eligible to read from Columnar FlashCache"
         FC_COL_IO_BY_R_SEC           "Number of megabytes per second that were read from Columnar FlashCache"
         FC_COL_IO_BY_SAVED           "Number of megabytes saved by reads from Columnar FlashCache"
         FC_COL_IO_BY_SAVED_SEC       "Number of megabytes saved per second by reads from Columnar FlashCache"
         FC_COL_IO_BY_W_POPULATE      "Number of megabytes that are population writes into Columnar FlashCache due to read miss"
         FC_COL_IO_BY_W_POPULATE_SEC  "Number of megabytes per second that are population writes into Columnar FlashCache due to read miss"
         FC_COL_IO_RQKEEP_R           "Number of requests read for keep objects from Columnar FlashCache"
         FC_COL_IO_RQKEEP_R_SEC       "Number of requests read per second for keep objects from Columnar FlashCache"
         FC_COL_IO_RQ_R               "Number of requests that were read from Columnar FlashCache"
         FC_COL_IO_RQ_R_ELIGIBLE      "Number of reads eligible for Columnar FlashCache"
         FC_COL_IO_RQ_R_ELIGIBLE_SEC  "Number of reads per second eligible for Columnar FlashCache"
         FC_COL_IO_RQ_R_SEC           "Number of requests per second that were read from Columnar FlashCache"
         FC_COL_IO_RQ_W_POPULATE      "Number of requests that are population writes into Columnar FlashCache due to read miss"
         FC_COL_IO_RQ_W_POPULATE_SEC  "Number of requests per second that are population writes into Columnar FlashCache due to read miss"
6.3.7.6 What to Look For When Monitoring Smart I/O

Smart I/O Not Performing As Expected

Smart I/O operations typically occur when a full table scan or index fast full scan is done on a row source. If smart I/O operations do not work as expected, then users tend to experience noticeable increases in query elapsed times. In some cases, the database shows increased cell smart table scan wait times. However, when wait events such as cell multiblock physical read or direct path read are present instead of cell smart table scan, this is an indicator that smart I/O operations are not being performed.

The following are symptoms and reasons that explain why smart I/O operations might not perform as expected:

  • Direct reads are a prerequisite for Smart Scan, and Smart Scan cannot occur without direct reads.

    The cell multiblock physical read wait event occurs when blocks are read into the buffer cache. A common reason for reading into the buffer cache, rather than using direct reads, is the size of the segment. For small segments, the optimizer favors the buffer cache over direct reads.

    Also, by design, Oracle Database shared server sessions do not use direct reads. Consequently, serial queries issued by a shared server session are not eligible for Smart Scan. For parallel queries issued in a shared server session, the parallel worker processes can use direct reads (and Smart Scan), but the query blocks the shared server for its entire duration.

  • The direct path read wait event occurs when direct reads are performed, but predicates are not offloaded to the storage servers. This may occur when there is a resource shortage on the storage servers. For example, there might be a memory shortage caused by a large number of concurrent parallel queries on the system.

    Such resource shortages are typically evident in the ExaWatcher data. In particular, you can review statistics such as Number of low memory threshold failures and Number of no memory threhsold failures in cellsrvstat. ExaWatcher also includes cellmem collection, which shows how memory is consumed on the storage servers, and is visible in the ExaWatcher charts produced using GetExaWatcherResults.sh.

    To address this issue, you can review the use of parallel query and potentially reduce the number of active parallel query servers.

  • Predicate offload is not possible when there is uncommitted data. This typically becomes an issue if large batch operations are modifying the data, and you attempt to run a large query against a large amount of uncommitted data.

    When Smart Scan encounters uncommitted data, predicate filtering cannot be offloaded to the storage servers, and additional data must be transported back to the database servers, which appears as increased bytes returned by the Smart Scan. Extra processing is also required on the database server to construct a read-consistent copy of the data, which is manifested in the following ways:

    • In the best-case scenario, additional buffer gets or session logical reads are required to construct a read-consistent copy of the data.
    • If the undo buffers reside in another database instance, then Oracle RAC-related wait events may also be observed. The Oracle RAC-related wait events are prefixed by gc.
    • If the undo blocks do not reside in the buffer cache of any database instance, then additional cell single block physical read waits are observed in conjunction with the single block I/O that is required for read-consistency. The additional I/O can significantly impact the performance of the operation.
  • When the storage server CPU utilization is high, the storage server sends data back to the database for processing, rather than consuming even more storage server CPU to perform predicate evaluation. This is known as 'reverse offload'.

    When this occurs, it is evident in the Reverse Offload column in the Smart IO section of the AWR report, and the database statistic cell physical IO bytes sent directly to DB node to balance CPU usage.

    The high CPU utilization on the storage servers may be due to the type of predicates being offloaded. For example, case-insensitive searches, or the use of REGEXP_LIKE utilize more CPU than simpler predicates.

    Increased storage server CPU and I/O load may also stem from SQL execution plan changes in the database, in which case, reviewing the execution plans and tuning the affected SQL statements may help to resolve the issue.

  • When the storage server is unable to perform predicate evaluation, it will send the data back to the database for processing. This is also known as passthough (or 'passthru'). The following all indicate that passthrough is occurring:

    • A large value in the Passthru column in the Smart IO section of the AWR report, when compared to eligible bytes.
    • A large value in the database statistic cell num bytes in passthru during predicate offload, when compared to cell physical IO bytes eligible for smart IO.
    • A large Cell passthru IO bytes value in the SQL monitor row source statistic, compared to the Eligible bytes for Smart IO value.

    Possible causes include:

    • Quarantines — To confirm, review the database statistic cell num bytes in passthru due to quarantine, or the SQL monitor row source statistic Cell passthru IO bytes due to quarantine.

    • Database timezone upgrade — Smart Scan is disabled when a database timezone upgrade is ongoing. Review the database statistic cell num smart IO sessions using passthru mode due to timezone from the Global Activity Statistics or Instance Activity Statistics section, or the Passthru Reasons in the Smart IO section of the AWR report. Depending on the database release, you may also observe the cell smart table scan: db timezone upgrade or cell smart index scan: db timezone upgrade wait event.
    • User setting — A user or application may set cell_offload_processing=false, which disables smart scans. To confirm, review the database statistic cell num smart IO sessions using passthru mode due to user from the Global Activity Statistics or Instance Activity Statistics section, or the Passthru Reasons in the Smart IO section of the AWR report. Depending on the database release, you may also observe the cell smart table scan: disabled by user or cell smart index scan: disabled user wait event.
    • Operation cannot be offloaded — There are other reasons why the storage servers may be unable to perform predicate offload. Instances of this occurrence would be visible in the database statistic cell num smart IO sessions using passthru mode due to cellsrv, or in wait events cell smart table scan: passthrough or cell smart index scan: passthrough. The following section describes the reasons in detail.
  • In some situations, Oracle Database reverts to block I/O mode for an operation typically performed using smart I/O. When this occurs, it is evident in the database statistic cell num bytes in block IO during predicate offload.

    There are additional related statistics that provide insight into the underlying cause. These statistics have names beginning with cell num smart IO sessions in rdbms block IO due to. For example, cell num smart IO sessions in rdbms block IO due to online encr counts sessions that revert to block I/O mode because of an ongoing online encryption operation that prevents using smart I/O.

Operation Not Being Offloaded

A smart I/O operation cannot be offloaded to the Exadata storage servers in the following cases:

  • A scan is performed on a clustered table
  • A scan is performed on an index-organized table
  • A fast full scan is performed on reverse key indexes
  • The table has row dependencies enabled or the rowscn is being fetched
  • The optimizer wants the scan to return rows in ROWID order
  • The command CREATE INDEX using nosort
  • A LONG column is being selected or queried.
  • The query contains a compressed or out-of-line LOB. An out-of-line LOB stores LOB data apart from the other row data and is typically larger than 4 KB in size.
  • A SELECT ... VERSIONS query is done on a table
  • A query that has more than 255 columns referenced, and the heap table is uncompressed, or Basic or OLTP compressed. However, such queries on tables compressed using Exadata Hybrid Columnar Compression are offloaded.
  • The tablespace is encrypted, and the CELL_OFFLOAD_DECRYPTION parameter is set to FALSE. In order for the Oracle Exadata System Software to perform decryption, Oracle Database needs to send the decryption key to the storage server. This feature is typically disabled, if there are security concerns about keys being shipped across the network to the storage server
  • The tablespace is not completely stored on Oracle Exadata Storage Server.
  • The predicate evaluation is on a virtual column.
  • Although offloading is supported for most SQL operators and functions, Oracle Exadata System Software does not support offloading for some SQL operators and functions. The dynamic performance view V$SQLFN_METADATA includes information about whether offloading is supported for a SQL operator or function. If the OFFLOADABLE column contains YES, then offloading is supported for the corresponding operator or function. NO indicates that offloading is not supported for the corresponding operator or function.

Storage Index Not Performing as Expected

The storage index resides in storage server memory. Before the persistent storage index feature (introduced in Oracle Exadata System Software release 21.2.0), the storage index is lost whenever cellsrv stops and must be rebuilt after cellsrv starts. Consequently, on systems not enabled with the persistent storage index feature, the storage index is not immediately available every time cellsrv starts.

On systems enabled with the persistent storage index feature, the storage index resides in shared memory, which is maintained across cellsrv restarts. Furthermore, storage index data is automatically saved to persistent storage during a graceful server shutdown. During server restart, the storage index is automatically restored but is unavailable during the restoration process. Finally, the storage index is lost and must be completely rebuilt if the storage server suffers from an ungraceful shutdown (such as a power outage or kernel panic).

For segments in an unencrypted tablespace, the storage index is maintained when DML occurs. However, for segments in an encrypted tablespace, DML invalidates the portion of the storage index associated with each changed data chunk (1 MB). The invalidated chunks are rebuilt during the next scan of the segment. However, the overall efficiency of the storage index is not optimal while portions are invalid.

To monitor storage index performance, monitor:

  • The Storage Index column in the Smart IO section of the AWR report.

  • The database statistic cell physical IO bytes saved by storage index.

  • The SQL monitor row source statistic SI saved bytes.

You may see reduced storage index savings if the storage index is not yet built (or rebuilt). While the storage index is building, you will see increases in the database statistic cell physical IO bytes added to storage index and the SQL monitor row source statistic Bytes added to storage index.

Columnar Cache Not Performing as Expected

Like storage index, if the columnar cache is not yet built (or rebuilt), you may see reduced savings associated with columnar cache. To monitor columnar cache performance, monitor the Columnar Cache sections of the AWR report, or the database statistic cell physical IO bytes saved by columnar cache, or the SQL monitor row source statistic Columnar cache saved bytes.

Similar to storage index, the columnar cache is rebuilt every time cellsrv starts. Consequently, the columnar cache cannot benefit operations immediately after cellsrv starts.

The columnar cache is automatically populated and maintained by the storage server when a Smart Scan is performed. For Smart Scan operations on uncompressed segments, segments compressed using OLTP compression, and segments compressed using Exadata Hybrid Columnar Compression, data is automatically converted into the columnar cache format (no memcompress) as part of the Smart Scan.

However, if you are using Oracle Database In-Memory, data is rewritten into Oracle Database In-Memory columnar format (memcompress for query or memcompress for capacity) by using a background process. Consequently, operations on that data do not benefit from the Oracle Database In-Memory optimizations until the cache is repopulated. Information about the population jobs is available in the Columnar Cache Population sections of the AWR report.

If you are reading less from columnar cache, then it will be evident by lower values in the database statistics: cell physical IO bytes processed for IM Query, cell physical IO bytes processed for IM Capacity, or cell physical IO bytes processed for no memcompress. The equivalent SQL monitor row source statistics are: IM Query bytes, IM Capacity bytes, and No memcompress bytes.

6.3.8 Monitoring I/O Resource Management (IORM)

Exadata I/O Resource Management (IORM) enables multiple databases and pluggable databases (PDBs) to share the same storage while ensuring that I/O resources are allocated appropriately across the various databases that share the system. This prevents one database from utilizing the entire I/O bandwidth and degrading the performance of the other databases. IORM also works in conjunction with Oracle Database Resource Manager to manage I/O resources across multiple consumer groups within a single database.

6.3.8.1 Monitoring I/O Resource Management (IORM) Using AWR

The AWR report includes information about the top databases running on the Exadata system. By using this information, you can monitor system-wide I/O resource consumption regardless of the database that hosts the AWR report. Additionally, the AWR report includes per-cell statistics, which enables you to easily determine if there is any skew in the I/O resource usage across the cells.

The AWR report includes:

  • Top Databases by IO Requests — provides statistics for the databases with the greatest number of I/O requests across all cells. It also breaks down the requests by device type (flash or disk) and by size (small or large).

  • Top Databases by Requests - Details — provides additional statistics regarding the I/O requests. Specifically, it shows the average latency and the average queue time for I/O requests. The queue time is the amount of time that an I/O requests spends in relevant I/O queue. Large queue times indicate that IORM is throttling the I/O.

  • Top Databases by IO Requests per Cell — provides statistics for the databases with the greatest number of I/O requests on each cell. It also breaks down the requests by device type (flash or disk) and by size (small or large). By using this information, you can easily determine if any cell behaves differently when servicing the I/O requests from the database(s).

  • Top Databases by IO Requests per Cell - Details — provides additional statistics regarding the I/O requests on each cell. Specifically, it shows the average latency and the average queue time for I/O requests on each cell. By using this information, you can easily determine if IORM behaves differently on any cell.

  • Top Databases by IO Throughput — provides statistics for the databases with the greatest I/O throughput across all cells. It also breaks down the throughput by device type (flash or disk) and by request size (small or large).

  • Top Databases by Throughput per Cell — provides statistics for the databases with the greatest I/O throughput on each cell. It also breaks down the throughput by device type (flash or disk) and by request size (small or large). By using this information, you can easily determine if any cell behaves differently when servicing the I/O from the database(s).

The AWR report may not be show information about all of the databases running on the Exadata system. Rather, it is designed to focus on the databases that are responsible for the majority of I/O on the storage servers. If more detailed information is required, then IORM cell metrics should be used.

For multitenant container databases (CDBs), the statistics in the AWR report include all of the I/Os associated with the database, including all of the pluggable databases (PDBs).

The following example shows the Top Databases by IO Requests and Top Databases by Requests - Details sections in the AWR report. The Top Databases by IO Requests section shows that the DB1 database generates 46% of the I/O requests captured in AWR. It also shows that approximately 10% of the I/O requests associated with the DB1 database are I/O requests to disk devices (51,854,884 disk I/O requests of the 564,616,086 total I/O requests). The Top Databases by Requests - Details section mostly shows minimal IORM wait time (queue time < 1 millisecond) across all databases. The exception is large I/O requests to flash for the DB1 database, which each have an average queue time of 9.12 milliseconds.

Figure 6-25 AWR Report: Top Databases by IO Requests

The image shows an example of the Top Databases by IO Requests and Top Databases by Requests - Details sections in the AWR report.
6.3.8.2 Monitoring I/O Resource Management (IORM) Using Database Statistics

The following table describes database statistics that are useful for monitoring I/O Resource Management (IORM). The statistics are available in various dynamic performance views, including V$SYSSTAT, and may be displayed in the Global Activity Statistics or Instance Activity Statistics section of an AWR report.

Statistic Description
Session total flash IO requests Total number of physical I/O requests on flash.
Session IORM flash wait time

Total amount of IORM wait time for flash I/O requests (in microseconds).

To determine the average IORM wait time for flash I/O requests, divide Session IORM flash wait time by Session total flash IO requests.

PDB total disk IO requests Total number of physical I/O requests on disk.
PDB IORM disk wait time

Total amount of IORM wait time for disk I/O requests (in microseconds).

To determine the average IORM wait time for disk I/O requests, divide PDB IORM disk wait time by PDB total disk IO requests.

The availability of a specific statistic is subject to the version of Oracle Database being used.

6.3.8.3 Monitoring I/O Resource Management (IORM) Using Exadata Metrics

You can monitor I/O Resource Management (IORM) by using Oracle Exadata System Software metrics.

IORM uses the database unique name, not the database identifier, to collect statistics and display output. Starting with Oracle Exadata System Software release 19.1.0, if you configured ASM-scoped security for the Oracle Automatic Storage Management (Oracle ASM) cluster used by the database, then the database name is prefixed with the Oracle ASM cluster name.

6.3.8.3.1 Monitoring IORM with Database Metrics

Database metrics provide information about the I/O load from each database listed in the IORM interdatabase plan.

Database metrics are identified in the METRICCURRENT, METRICDEFINITION, and METRICHISTORY objects as having objectType=IORM_DATABASE.

Example 6-8 Displaying Category Metric Definitions

This example shows how to display the database metric definitions that are available in the Oracle Exadata System Software.

CellCLI> LIST METRICDEFINITION ATTRIBUTES NAME,DESCRIPTION WHERE OBJECTTYPE = IORM_DATABASE
         DB_FC_BY_ALLOCATED      "Number of megabytes allocated in flash cache for this database"
         DB_FC_IO_BY_SEC         "Number of megabytes of I/O per second for this database to flash cache"
         DB_FC_IO_RQ             "Number of IO requests issued by a database to flash cache"
         DB_FC_IO_RQ_LG          "Number of large IO requests issued by a database to flash cache"
         DB_FC_IO_RQ_LG_SEC      "Number of large IO requests issued by a database to flash cache per second"
         DB_FC_IO_RQ_SEC         "Number of IO requests issued by a database to flash cache per second"
         DB_FC_IO_RQ_SM          "Number of small IO requests issued by a database to flash cache"
         DB_FC_IO_RQ_SM_SEC      "Number of small IO requests issued by a database to flash cache per second"
         DB_FD_IO_BY_SEC         "Number of megabytes of I/O per second for this database to flash disks"
         DB_FD_IO_LOAD           "Average I/O load from this database for flash disks"
         DB_FD_IO_RQ_LG          "Number of large IO requests issued by a database to flash disks"
         DB_FD_IO_RQ_LG_SEC      "Number of large IO requests issued by a database to flash disks per second"
         DB_FD_IO_RQ_SM          "Number of small IO requests issued by a database to flash disks"
         DB_FD_IO_RQ_SM_SEC      "Number of small IO requests issued by a database to flash disks per second"
         DB_FD_IO_TM             "The cumulative latency of reading or writing blocks by a database from flash disks"
         DB_FD_IO_TM_LG          "The cumulative latency of reading or writing large blocks by a database from flash disks"
         DB_FD_IO_TM_LG_RQ       "The rate which is the average latency of reading or writing large blocks per request by a database from flash disks"
         DB_FD_IO_TM_RQ          "The rate which is the average latency of reading or writing blocks per request by a database from flash disks"
         DB_FD_IO_TM_SM          "The cumulative latency of reading or writing small blocks by a database from flash disks"
         DB_FD_IO_TM_SM_RQ       "The rate which is the average latency of reading or writing small blocks per request by a database from flash disks"
         DB_FD_IO_UTIL           "Percentage of flash resources utilized by this database"
         DB_FD_IO_UTIL_LG        "Percentage of flash resources utilized by large requests from this database"
         DB_FD_IO_UTIL_SM        "Percentage of flash resources utilized by small requests from this database"
         DB_FD_IO_WT_LG          "IORM wait time for large IO requests issued to flash disks by a database"
         DB_FD_IO_WT_LG_RQ       "Average IORM wait time per request for large IO requests issued to flash disks by a database"
         DB_FD_IO_WT_SM          "IORM wait time for small IO requests issued to flash disks by a database"
         DB_FD_IO_WT_SM_RQ       "Average IORM wait time per request for small IO requests issued to flash disks by a database"
         DB_FL_IO_BY             "The number of MB written to the Flash Log"
         DB_FL_IO_BY_SEC         "The number of MB written per second to the Flash Log"
         DB_FL_IO_RQ             "The number of I/O requests issued to the Flash Log"
         DB_FL_IO_RQ_SEC         "The number of I/O requests per second issued to the Flash Log"
         DB_IO_BY_SEC            "Number of megabytes of I/O per second for this database to hard disks"
         DB_IO_LOAD              "Average I/O load from this database for hard disks"
         DB_IO_RQ_LG             "Number of large IO requests issued by a database to hard disks"
         DB_IO_RQ_LG_SEC         "Number of large IO requests issued by a database to hard disks per second"
         DB_IO_RQ_SM             "Number of small IO requests issued by a database to hard disks"
         DB_IO_RQ_SM_SEC         "Number of small IO requests issued by a database to hard disks per second"
         DB_IO_TM_LG             "The cumulative latency of reading or writing large blocks by a database from hard disks"
         DB_IO_TM_LG_RQ          "The rate which is the average latency of reading or writing large blocks per request by a database from hard disks"
         DB_IO_TM_SM             "The cumulative latency of reading or writing small blocks by a database from hard disks"
         DB_IO_TM_SM_RQ          "The rate which is the average latency of reading or writing small blocks per request by a  database from hard disks"
         DB_IO_UTIL_LG           "Percentage of disk resources utilized by large requests from this database"
         DB_IO_UTIL_SM           "Percentage of disk resources utilized by small requests from this database"
         DB_IO_WT_LG             "IORM wait time for large IO requests issued to hard disks by a database"
         DB_IO_WT_LG_RQ          "Average IORM wait time per request for large IO requests issued to hard disks by a database"
         DB_IO_WT_SM             "IORM wait time for small IO requests issued to hard disks by a database"
         DB_IO_WT_SM_RQ          "Average IORM wait time per request for small IO requests issued to hard disks by a database"
         DB_XRM_BY_ALLOCATED      "Number of megabytes allocated in XRMEM cache for this database"

Note the following additional details:

  • The database for the metric is specified by the metricObjectName attribute of the METRICCURRENT and METRICHISTORY objects.

  • For metrics that relate to I/O load (for example, DB_FD_IO_LOAD and DB_IO_LOAD), see the additional information relating to CD_IO_LOAD.

  • Starting with Oracle Exadata System Software release 19.1.0, if you configured ASM-scoped security for the Oracle Automatic Storage Management (Oracle ASM) cluster used by the database, then the database name is prefixed with the Oracle ASM cluster name.

  • For cumulative metrics, the metric value for a specific time period can be determined by subtracting values from different collectionTime periods.

  • For rate metrics, the time period for the metric value is over the previous minute.

  • In the metric descriptions, small I/O requests are less than or equal to 128 KB, and large I/O requests are larger than 128 KB.

  • All database cumulative metrics are reset to zero whenever a category, IORM, or any database resource plan is modified.

  • To list the database metric history for an inter-database plan, use the following CellCLI command:

    CellCLI> LIST METRICHISTORY WHERE objectType = 'IORM_DATABASE' AND metricValue != 0 ATTRIBUTES name, metricObjectName, metricValue, collectionTime
  • For multitenant container databases (CDBs), the database metric observations include all of the I/Os associated with the database, including all of the associated pluggable databases (PDBs). For example, the value for DB_FC_IO_BY_SEC includes the sum of the PDB_FC_IO_BY_SEC values for all of the PDBs hosted by the CDB.
  • Observations for Oracle ASM and all other databases not listed in the interdatabase plan are grouped together using _OTHER_DATABASE_ as the metricObjectName value.

6.3.8.3.2 Monitoring IORM with PDB Metrics

Pluggable Database (PDB) metrics provide information about the I/O load from each PDB hosted by a container database (CDB) listed in the IORM interdatabase plan.

PDB metrics are identified in the METRICCURRENT, METRICDEFINITION, and METRICHISTORY objects as having objectType=IORM_PLUGGABLE_DATABASE.

Example 6-9 Displaying PDB Metric Definitions

This example shows how to display the PDB metric definitions that are available in the Oracle Exadata System Software.

CellCLI> LIST METRICDEFINITION ATTRIBUTES NAME,DESCRIPTION WHERE OBJECTTYPE = IORM_PLUGGABLE_DATABASE
         PDB_FC_BY_ALLOCATED     "Number of megabytes allocated in flash cache for this pluggable database"
         PDB_FC_IO_BY_SEC        "Number of megabytes of I/O per second for this pluggable database to flash cache"
         PDB_FC_IO_RQ            "Number of IO requests issued by this pluggable database to flash cache"
         PDB_FC_IO_RQ_LG         "Number of large IO requests issued by this pluggable database to flash cache"
         PDB_FC_IO_RQ_LG_SEC     "Number of large IO requests issued by this pluggable database to flash cache per second"
         PDB_FC_IO_RQ_SEC        "Number of IO requests issued by this pluggable database to flash cache per second"
         PDB_FC_IO_RQ_SM         "Number of small IO requests issued by this pluggable database to flash cache"
         PDB_FC_IO_RQ_SM_SEC     "Number of small IO requests issued by this pluggable database to flash cache per second"
         PDB_FD_IO_BY_SEC        "Number of megabytes of I/O per second for this pluggable database to flash disks"
         PDB_FD_IO_LOAD          "Average I/O load from this pluggable database for flash disks"
         PDB_FD_IO_RQ_LG         "Number of large IO requests issued by this pluggable database to flash disks"
         PDB_FD_IO_RQ_LG_SEC     "Number of large IO requests issued by this pluggable database to flash disks per second"
         PDB_FD_IO_RQ_SM         "Number of small IO requests issued by this pluggable database to flash disks"
         PDB_FD_IO_RQ_SM_SEC     "Number of small IO requests issued by this pluggable database to flash disks per second"
         PDB_FD_IO_TM            "The cumulative latency of reading or writing blocks by this pluggable database from flash disks"
         PDB_FD_IO_TM_LG         "The cumulative latency of reading or writing large blocks by this pluggable database from flash disks"
         PDB_FD_IO_TM_LG_RQ      "The rate which is the average latency of reading or writing large blocks per request by this pluggable database from flash disks"
         PDB_FD_IO_TM_RQ         "The rate which is the average latency of reading or writing blocks per request by this pluggable database from flash disks"
         PDB_FD_IO_TM_SM         "The cumulative latency of reading or writing small blocks by this pluggable database from flash disks"
         PDB_FD_IO_TM_SM_RQ      "The rate which is the average latency of reading or writing small blocks per request by this pluggable database from flash disks"
         PDB_FD_IO_UTIL          "Percentage of flash resources utilized by this pluggable database"
         PDB_FD_IO_UTIL_LG       "Percentage of flash resources utilized by large requests from this pluggable database"
         PDB_FD_IO_UTIL_SM       "Percentage of flash resources utilized by small requests from this pluggable database"
         PDB_FD_IO_WT_LG         "IORM wait time for large IO requests issued to flash disks by this pluggable database"
         PDB_FD_IO_WT_LG_RQ      "Average IORM wait time per request for large IO requests issued to flash disks by this pluggable database"
         PDB_FD_IO_WT_SM         "IORM wait time for small IO requests issued to flash disks by this pluggable database"
         PDB_FD_IO_WT_SM_RQ      "Average IORM wait time per request for small IO requests issued to flash disks by this pluggable database"
         PDB_IO_BY_SEC           "Number of megabytes of I/O per second for this pluggable database to hard disks"
         PDB_IO_LOAD             "Average I/O load from this pluggable database for hard disks"
         PDB_IO_RQ_LG            "Number of large IO requests issued by this pluggable database to hard disks"
         PDB_IO_RQ_LG_SEC        "Number of large IO requests issued by this pluggable database to hard disks per second"
         PDB_IO_RQ_SM            "Number of small IO requests issued by this pluggable database to hard disks"
         PDB_IO_RQ_SM_SEC        "Number of small IO requests issued by this pluggable database to hard disks per second"
         PDB_IO_TM_LG            "The cumulative latency of reading or writing large blocks by this pluggable database from hard disks"
         PDB_IO_TM_LG_RQ         "The rate which is the average latency of reading or writing large blocks per request by this pluggable database from hard disks"
         PDB_IO_TM_SM            "The cumulative latency of reading or writing small blocks by this pluggable database from hard disks"
         PDB_IO_TM_SM_RQ         "The rate which is the average latency of reading or writing small blocks per request by this pluggable database from hard disks"
         PDB_IO_UTIL_LG          "Percentage of disk resources utilized by large requests from this pluggable database"
         PDB_IO_UTIL_SM          "Percentage of disk resources utilized by small requests from this pluggable database"
         PDB_IO_WT_LG            "IORM wait time for large IO requests issued to hard disks by this pluggable database"
         PDB_IO_WT_LG_RQ         "Average IORM wait time per request for large IO requests issued to hard disks by this pluggable database"
         PDB_IO_WT_SM            "IORM wait time for small IO requests issued to hard disks by this pluggable database"
         PDB_IO_WT_SM_RQ         "Average IORM wait time per request for small IO requests issued to hard disks by this pluggable database"
         PDB_XRM_BY_ALLOCATED    "Number of megabytes allocated in XRMEM cache for this pluggable database"

Note the following additional details:

  • The PDB for the metric is specified by the metricObjectName attribute of the METRICCURRENT and METRICHISTORY objects. The PDB name is a concatenation of the CDB name with the PDB name.

  • For metrics that relate to I/O load (for example, PDB_FD_IO_LOAD and PDB_IO_LOAD), see the additional information relating to CD_IO_LOAD.

  • Starting with Oracle Exadata System Software release 19.1.0, if you configured ASM-scoped security for the Oracle Automatic Storage Management (Oracle ASM) cluster used by the database, then the database name is prefixed with the Oracle ASM cluster name.

  • For cumulative metrics, the metric value for a specific time period can be determined by subtracting values from different collectionTime periods.

  • For rate metrics, the time period for the metric value is over the previous minute.

  • In the metric descriptions, small I/O requests are less than or equal to 128 KB, and large I/O requests are larger than 128 KB.

6.3.8.3.3 Monitoring IORM with Consumer Group Metrics

Consumer group metrics provide information about the I/O load from each consumer group specified in a database resource plan.

Consumer group metrics are identified in the METRICCURRENT, METRICDEFINITION, and METRICHISTORY objects as having objectType=IORM_CONSUMER_GROUP.

Example 6-10 Displaying Consumer Group Metric Definitions

This example shows how to display the consumer group metric definitions that are available in the Oracle Exadata System Software.

CellCLI> LIST METRICDEFINITION ATTRIBUTES NAME,DESCRIPTION WHERE OBJECTTYPE = IORM_CONSUMER_GROUP
         CG_FC_IO_BY_SEC         "Number of megabytes of I/O per second for this consumer group to flash cache"
         CG_FC_IO_RQ             "Number of IO requests issued by a consumer group to flash cache"
         CG_FC_IO_RQ_LG          "Number of large IO requests issued by a consumer group to flash cache"
         CG_FC_IO_RQ_LG_SEC      "Number of large IO requests issued by a consumer group to flash cache per second"
         CG_FC_IO_RQ_SEC         "Number of IO requests issued by a consumer group to flash cache per second"
         CG_FC_IO_RQ_SM          "Number of small IO requests issued by a consumer group to flash cache"
         CG_FC_IO_RQ_SM_SEC      "Number of small IO requests issued by a consumer group to flash cache per second"
         CG_FD_IO_BY_SEC         "Number of megabytes of I/O per second for this consumer group to flash disks"
         CG_FD_IO_LOAD           "Average I/O load from this consumer group for flash disks"
         CG_FD_IO_RQ_LG          "Number of large IO requests issued by a consumer group to flash disks"
         CG_FD_IO_RQ_LG_SEC      "Number of large IO requests issued by a consumer group to flash disks per second"
         CG_FD_IO_RQ_SM          "Number of small IO requests issued by a consumer group to flash disks"
         CG_FD_IO_RQ_SM_SEC      "Number of small IO requests issued by a consumer group to flash disks per second"
         CG_FD_IO_TM             "The cumulative latency of reading or writing blocks by a consumer group from flash disks"
         CG_FD_IO_TM_LG          "The cumulative latency of reading or writing large blocks by a consumer group from flash disks"
         CG_FD_IO_TM_LG_RQ       "The rate which is the average latency of reading or writing large blocks per request by a consumer group from flash disks"
         CG_FD_IO_TM_RQ          "The rate which is the average latency of reading or writing blocks per request by a consumer group from flash disks"
         CG_FD_IO_TM_SM          "The cumulative latency of reading or writing small blocks by a consumer group from flash disks"
         CG_FD_IO_TM_SM_RQ       "The rate which is the average latency of reading or writing small blocks per request by a consumer group from flash disks"
         CG_FD_IO_UTIL           "Percentage of flash resources utilized by this consumer group"
         CG_FD_IO_UTIL_LG        "Percentage of flash resources utilized by large requests from this consumer group"
         CG_FD_IO_UTIL_SM        "Percentage of flash resources utilized by small requests from this consumer group"
         CG_FD_IO_WT_LG          "IORM wait time for large IO requests issued to flashdisks by a consumer group"
         CG_FD_IO_WT_LG_RQ       "Average IORM wait time per request for large IO requests issued to flash disks by a consumer group"
         CG_FD_IO_WT_SM          "IORM wait time for small IO requests issued to flashdisks by a consumer group"
         CG_FD_IO_WT_SM_RQ       "Average IORM wait time per request for small IO requests issued to flash disks by a consumer group"
         CG_IO_BY_SEC            "Number of megabytes of I/O per second for this consumer group to hard disks"
         CG_IO_LOAD              "Average I/O load from this consumer group for hard disks"
         CG_IO_RQ_LG             "Number of large IO requests issued by a consumer group to hard disks"
         CG_IO_RQ_LG_SEC         "Number of large IO requests issued by a consumer group to hard disks per second"
         CG_IO_RQ_SM             "Number of small IO requests issued by a consumer group to hard disks"
         CG_IO_RQ_SM_SEC         "Number of small IO requests issued by a consumer group to hard disks per second"
         CG_IO_TM_LG             "The cumulative latency of reading or writing large blocks by a consumer group from hard disks"
         CG_IO_TM_LG_RQ          "The rate which is the average latency of reading or writing large blocks per request by a consumer group from hard disks"
         CG_IO_TM_SM             "The cumulative latency of reading or writing small blocks by a consumer group from hard disks"
         CG_IO_TM_SM_RQ          "The rate which is the average latency of reading or writing small blocks per request by a consumer group from hard disks"
         CG_IO_UTIL_LG           "Percentage of disk resources utilized by large requests from this consumer group"
         CG_IO_UTIL_SM           "Percentage of disk resources utilized by small requests from this consumer group"
         CG_IO_WT_LG             "IORM wait time for large IO requests issued to hard disks by a consumer group"
         CG_IO_WT_LG_RQ          "Average IORM wait time per request for large IO requests issued to hard disks by a consumer group"
         CG_IO_WT_SM             "IORM wait time for small IO requests issued to hard disks by a consumer group"
         CG_IO_WT_SM_RQ          "Average IORM wait time per request for small IO requests issued to hard disks by a consumer group"

Note the following additional details:

  • The consumer group and database for the metric are specified by the metricObjectName attribute of the METRICCURRENT and METRICHISTORY objects. The name is formed by the database name followed by a period (.) and the consumer group name. For example, for a database named PRODUCTIONDB and a consumer group named OLTP, the metricObjectName would be PRODUCTIONDB.OLTP.

  • For metrics that relate to I/O load (for example, CG_FD_IO_LOAD and CG_IO_LOAD), see the additional information relating to CD_IO_LOAD.

  • Starting with Oracle Exadata System Software release 19.1.0, if you configured ASM-scoped security for the Oracle Automatic Storage Management (Oracle ASM) cluster used by the database, then the database name is prefixed with the Oracle ASM cluster name.

  • For cumulative metrics, the metric value for a specific time period can be determined by subtracting values from different collectionTime periods.

  • For rate metrics, the time period for the metric value is over the previous minute.

  • In the metric descriptions, small I/O requests are less than or equal to 128 KB, and large I/O requests are larger than 128 KB.

  • All consumer group cumulative metrics are reset to zero whenever a category, IORM, or any database resource plan is modified.

  • To list the current metrics for consumer groups, use the following CellCLI command:

    CellCLI> LIST METRICCURRENT WHERE objectType = 'IORM_CONSUMER_GROUP' AND metricValue != 0 ATTRIBUTES name, metricObjectName, metricValue, collectionTime
  • For Oracle ASM and all other databases, metrics are only provided for the BACKGROUND and OTHER consumer groups. The BACKGROUND consumer groups are:

    • _ORACLE_BACKGROUND_GROUP_: High-priority I/O requests from Oracle Database background processes

    • _ORACLE_MEDPRIBG_GROUP_: Medium-priority I/O requests from Oracle Database background processes

    • _ORACLE_LOWPRIBG_GROUP_: Low-priority I/O requests from Oracle Database background processes

6.3.8.3.4 Monitoring IORM with Category Metrics

Category metrics provide information about the I/O load from each category specified in the current IORM category plan.

Category metrics are identified in the METRICCURRENT, METRICDEFINITION, and METRICHISTORY objects as having objectType=IORM_CATEGORY.

The category for the metric is specified by the metricObjectName attribute of the METRICCURRENT and METRICHISTORY objects.

For cumulative metrics, the metric value for a specific time period can be determined by subtracting values from different collectionTime periods. For rate metrics, the time period for the metric value is over the previous minute.

Example 6-11 Displaying Category Metric Definitions

This example shows how to display the category metric definitions that are available in the Oracle Exadata System Software.

CellCLI> LIST METRICDEFINITION ATTRIBUTES NAME,DESCRIPTION WHERE OBJECTTYPE = IORM_CATEGORY
         CT_FC_IO_BY_SEC         "Number of megabytes of I/O per second for this category to flash cache"
         CT_FC_IO_RQ             "Number of IO requests issued by an IORM category to flash cache"
         CT_FC_IO_RQ_LG          "Number of large IO requests issued by an IORM category to flash cache"
         CT_FC_IO_RQ_LG_SEC      "Number of large IO requests issued by an IORM category to flash cache per second"
         CT_FC_IO_RQ_SEC         "Number of IO requests issued by an IORM category to flash cache per second"
         CT_FC_IO_RQ_SM          "Number of small IO requests issued by an IORM category to flash cache"
         CT_FC_IO_RQ_SM_SEC      "Number of small IO requests issued by an IORM category to flash cache per second"
         CT_FD_IO_BY_SEC         "Number of megabytes of I/O per second for this category to flash disks"
         CT_FD_IO_LOAD           "Average I/O load from this category for flash disks"
         CT_FD_IO_RQ_LG          "Number of large IO requests issued by an IORM category to flash disks"
         CT_FD_IO_RQ_LG_SEC      "Number of large IO requests issued by an IORM category to flash disks per second"
         CT_FD_IO_RQ_SM          "Number of small IO requests issued by an IORM category to flash disks"
         CT_FD_IO_RQ_SM_SEC      "Number of small IO requests issued by an IORM category to flash disks per second"
         CT_FD_IO_TM             "The cumulative latency of reading or writing blocks for this category from flash disks"
         CT_FD_IO_TM_LG          "The cumulative latency of reading or writing large blocks for this category from flash disks"
         CT_FD_IO_TM_LG_RQ       "The rate which is the average latency of reading or writing large blocks per request for this category from flash disks"
         CT_FD_IO_TM_RQ          "The rate which is the average latency of reading or writing blocks per request for this category from flash disks"
         CT_FD_IO_TM_SM          "The cumulative latency of reading or writing small blocks for this category from flash disks"
         CT_FD_IO_TM_SM_RQ       "The rate which is the average latency of reading or writing small blocks per request for this category from flash disks"
         CT_FD_IO_UTIL           "Percentage of flash resources utilized by this category"
         CT_FD_IO_UTIL_LG        "Percentage of flash resources utilized by large requests from this category"
         CT_FD_IO_UTIL_SM        "Percentage of flash resources utilized by small requests from this category"
         CT_FD_IO_WT_LG          "IORM wait time for large IO requests issued to flash disks by an IORM category"
         CT_FD_IO_WT_LG_RQ       "Average IORM wait time per request for large IO requests issued to flash disks by an IORM category"
         CT_FD_IO_WT_SM          "IORM wait time for small IO requests issued to flash disks by an IORM category"
         CT_FD_IO_WT_SM_RQ       "Average IORM wait time per request for small IO requests issued to flash disks by an IORM category"
         CT_IO_BY_SEC            "Number of megabytes of I/O per second for this category to hard disks"
         CT_IO_LOAD              "Average I/O load from this category for hard disks"
         CT_IO_RQ_LG             "Number of large IO requests issued by an IORM category to hard disks"
         CT_IO_RQ_LG_SEC         "Number of large IO requests issued by an IORM category to hard disks per second"
         CT_IO_RQ_SM             "Number of small IO requests issued by an IORM category to hard disks"
         CT_IO_RQ_SM_SEC         "Number of small IO requests issued by an IORM category to hard disks per second"
         CT_IO_TM_LG             "The cumulative latency of reading or writing large blocks for this category from hard disks"
         CT_IO_TM_LG_RQ          "The rate which is the average latency of reading or writing large blocks per request for this category from hard disks"
         CT_IO_TM_SM             "The cumulative latency of reading or writing small blocks for this category from hard disks"
         CT_IO_TM_SM_RQ          "The rate which is the average latency of reading or writing small blocks per request for this category from hard disks"
         CT_IO_UTIL_LG           "Percentage of disk resources utilized by large requests from this category"
         CT_IO_UTIL_SM           "Percentage of disk resources utilized by small requests from this category"
         CT_IO_WT_LG             "IORM wait time for large IO requests issued to hard disks by an IORM category"
         CT_IO_WT_LG_RQ          "Average IORM wait time per request for large IO requests issued to hard disks by an IORM category"
         CT_IO_WT_SM             "IORM wait time for small IO requests issued to hard disks by an IORM category"
         CT_IO_WT_SM_RQ          "Average IORM wait time per request for small IO requests issued to hard disks by an IORM category"

Note the following additional details:

  • In the metric descriptions, small I/O requests are less than or equal to 128 KB, and large I/O requests are larger than 128 KB.

  • The unit of measurement for the wait metrics is milliseconds. The wait metrics have metric names starting with CD_IO_WT_.
  • For metrics that relate to I/O load (for example, CT_FD_IO_LOAD and CT_IO_LOAD), see the additional information relating to CD_IO_LOAD.

  • All category cumulative metrics are reset to zero whenever a category, IORM, or any database resource plan is modified.

  • To list the category metric history for an interdatabase plan, use the following CellCLI command:

    CellCLI> LIST METRICHISTORY WHERE objectType = 'IORM_CATEGORY' AND metricValue != 0 ATTRIBUTES name, metricObjectName, metricValue, collectionTime
  • Category metrics are also provided for the following internally-generated and automatically-managed categories:

    • _ASM_: Oracle ASM-related I/Os

    • _ORACLE_BG_CATEGORY_: High-priority I/Os issued by Oracle Database background processes

    • _ORACLE_MEDPRIBG_CATEGORY_: Medium-priority I/Os issued by Oracle Database background processes

    • _ORACLE_LOWPRIBG_CATEGORY_: Low-priority I/Os issued by Oracle Database background processes

6.3.8.3.5 Monitoring IORM Utilization

You can use metrics to monitor IORM utilization.

When OLTP and DSS workloads share Oracle Exadata Storage Servers, IORM determines whether to optimize for low latency or high throughput. To optimize for low latency, the concurrency of large I/O requests is reduced to ensure that I/O bandwidth is not saturated. To optimize for high throughput, each Oracle Exadata Storage Server must handle many concurrent large I/O requests, allowing the storage to be fully utilized while applying optimization algorithms. However, when a cell has many concurrent large I/O requests, average latency may increase because each I/O is queued behind other I/Os.

The utilization metrics for I/O requests from a database, pluggable database (PDB), or consumer group corresponds to the amount of time that the database, PDB, or consumer group utilized the storage server. Large I/O requests utilize the storage server more than small I/O requests. The following are the utilization metrics for determining IORM optimization:

  • CG_IO_UTIL_LG
  • CG_IO_UTIL_SM
  • PDB_IO_UTIL_LG
  • PDB_IO_UTIL_SM
  • CT_IO_UTIL_LG
  • CT_IO_UTIL_SM
  • DB_IO_UTIL_LG
  • DB_IO_UTIL_SM

By comparing the I/O resource consumption with the I/O resource allocations, the database administrator can determine if IORM should be tuned for latency or throughput, or if a balanced approach is optimal. The IORM metric, IORM_MODE, shows the mode for IORM. The metric value ranges between 1 and 3. The following are the definitions for the values:

Note:

If the current plan has the IORM attribute objective set to BASIC, then IORM_MODE has no meaning and should be ignored.
  • 1 means the cell IORM objective is set to low_latency.
  • 2 means the cell IORM objective is set to balanced.
  • 3 means the cell IORM objective is set to high_throughput.

A value in between 1 and 2, or between 2 and 3, indicates that the IORM objective changed in the metric period, and the precise value indicates proximity to a given objective. It is also indicative of a constantly-changing mix of workloads.

6.3.8.4 What to Look For When Monitoring I/O Resource Management (IORM)

I/O Latency

Issues with IORM typically result in increased I/O latency. This is usually characterized by higher latency in the cell single block physical read database wait event, and in some cases the cell smart table scan database wait event. If these database wait events are significant, and there is no corresponding latencies in the storage servers that are associated with the flash or disk devices, then this may indicate that IORM is throttling the workload.

To confirm that IORM throttling is occurring:

  • In the Top Databases by Requests - Details section of the AWR report, review the Queue Time columns, which show the average amount of time that IORM spent throttling IO requests for the database.
  • Review the cell metrics for IORM wait times: DB_IO_WT_SM_RQ, DB_IO_WT_LG_RQ, PDB_IO_WT_SM_RQ, PDB_IO_WT_LG_RQ and CG_IO_WT_SM_RQ.

You can use IORM database statistics and the AWR report to understand the I/O workload as a whole. You can use Exadata metrics to further understand I/O consumption by each category, database, or consumer group. By analyzing statistics and metrics, you can understand which category, database, pluggable database (PDB), or consumer group is not using its resource allocation and which is exceeding its resource allocation.

If the wait times are small or zero, then the plan allocation is sufficient. If the wait times are large, then the plan allocation is insufficient. If the wait times due to high I/O latency result in unacceptable performance, then the IORM plan can be adjusted to give a larger allocation, or additional storage servers may be required to deliver the required I/O resources.

Using IOSTAT

Device utilization and I/O service time statistics monitored using iostat (which is collected by ExaWatcher) are unreliable.

The following is stated in the Linux man page for iostat:

svctm - The average service time (in milliseconds) for I/O requests that were issued to the device. Warning! Do not trust this field any more. This field will be removed in a future sysstat version.

Since the utilization computation depends upon the I/O service time, it is also unreliable.

To monitor the actual I/O utilization for a cell disk, database, pluggable database or consumer group, use the corresponding IORM metrics.

6.3.9 Monitoring Cell Disk I/O

When database performance issues are related to I/O load on the Exadata storage servers, typically there will be increased latencies in the I/O-related wait events, and increased database time in the User I/O or System I/O wait classes. If the increased database latencies are due to the performance of the Exadata storage servers, then the increased latencies will also be visible in the cell-side statistics. If you have comprehensive baseline statistics for periods when the system is performing well, then you can compare the baseline statistics with other observations to identify differences and explain the situation.

For example, if Oracle Database reports increased cell single block physical read latencies, you can then check the statistics from the storage servers and compare them to a baseline to determine if the cause is increased latency on flash devices, more disk I/O, or some other cause. If the statistics show an increase in disk I/O requests, that may be related to a change in Exadata Smart Flash Cache, which would prompt you to review the Exadata Smart Flash Cache statistics. On the other hand, if the statistics show a latency increase for small reads from flash, then this may be caused by a change in the I/O pattern, and understanding the type of I/O that has increased (small reads, small writes, large reads, or large writes) can help to drive further investigation.

Typically, in an Exadata environment, an even workload distribution is expected across all cells and all disks. If one cell or one disk is doing more work than the others, then you should investigate further, as that cell or disk has the potential of slowing down the entire system.

6.3.9.1 Monitoring Cell Disk I/O Using AWR

The following sections in the Automatic Workload Repository (AWR) report are particularly useful for understanding I/O load on Exadata:

Often, to better understand characteristics about the I/O load, the statistics from these sections can be correlated with other sections in the AWR report.

Disk Activity

The Disk Activity section provides a high-level summary for potential sources of disk activity. The Disk Activity section is located in the AWR report under Exadata Statistics > Performance Summary.

Figure 6-26 AWR Report: Disk Activity

The image shows an example of the Disk Activity section in the AWR report.

High I/O load or a substantial change in the pattern of disk activity may prompt further investigation. Possible causes include:

  • Redo log write — result in disk writes when redo is written to disk. When using Exadata Smart Flash Log, note that redo is written to both Exadata Smart Flash Log and the online redo log file. Also, Oracle Exadata System Software release 20.1 adds a further optimization, known as Smart Flash Log Write-Back, that uses Exadata Smart Flash Cache in Write-Back mode instead of disk storage. For further details, review Database Redo Activity and Smart Flash Log in the AWR report.
  • Smart Scans — result in disk reads for requests that are not satisfied using Exadata Smart Flash Cache. These are typically large reads. For further details, review Smart IO in the AWR report.
  • Flash Cache misses — result in disk reads when requested data does not exist in Exadata Smart Flash Cache. These are typically small reads. For further details, review Flash Cache Misses in the AWR report.
  • Flash Cache read skips — result in disk reads when requested data is not eligible for Exadata Smart Flash Cache. For further details, review Flash Cache User Reads - Skips in the AWR report.
  • Flash Cache write skips or Flash Cache LW rejections — results in disk writes when data is not eligible for Exadata Smart Flash Cache. For further details, review Flash Cache User Writes - Skips and Flash Cache User Writes - Large Write Rejects in the AWR report.
  • Disk writer writes — result in disk writes when data from Exadata Smart Flash Cache in Write-Back mode is persisted to disk. For further details, review Flash Cache Internal Writes in the AWR report.
  • Scrub IO — occurs when Oracle Exadata System Software automatically inspects and repairs the hard disks. Scrub I/O is performed periodically when the hard disks are idle, and mostly results in large disk reads, which should be throttled automatically if the disk becomes I/O bound.

The specific causes listed in this section are subject to the version of Oracle Database being used.

Exadata Resource Statistics

The Exadata Resource Statistics section contains many statistics and is organized into several sub-sections. Primarily, the statistics enumerate the I/O occurring on the storage servers using information from the storage server operating system (OS) and Oracle Exadata System Software. From the OS, it includes statistics relating to I/Os per second (IOPS), throughput, utilization, service time and wait time. These statistics are equivalent to the statistics shown by the iostat command. From the Oracle Exadata System Software, it includes statistics relating to IOPS, throughput, and latency, which are also broken down by small reads, small writes, large reads, and large writes. These statistics are based on the cell disk metrics.

The statistics are aggregated by device type, and then by cell or disk. The device type aggregation ensures comparison across the same device type, as different device types are expected to have different performance characteristics. The statistics are presented two ways. Firstly, they are presented to enable outlier analysis. They are also organized to show the 'Top N' cells or disks for a specific statistic.

The outlier analysis presentation allows you to quickly see the statistics aggregated across all storage servers, by cell, and by disk. The display also includes the statistical mean, standard deviation, and normal range. The normal range is based on the mean and standard deviation, not the observed low and high values. For cells, the normal range is the range of values that are plus or minus one standard deviation from the mean. For disks, the normal range is the range of values that are plus or minus three standard deviations from the mean. If there are cells or disks that fall outside the normal range, then they are reported as outliers. This simple outlier analysis is designed to highlight areas for further investigation. However, based on the number of cells or disks in use, and the value of the standard deviation, the outlier analysis may not identify outliers in all cases.

The 'Top N' presentation simply shows the top ranked cells or disks for a specific statistic. This presentation enables you to identify cells or disks that perform more or less work than the others. By using this presentation, you can potentially identify outliers that are not identified by the outlier analysis. Also highlighted in these sections are cells or disks that exceed the expected maximum IOPS for the device or the expected maximum throughput for the device.

The following list outlines the sub-sections in the Exadata Resource Statistics section of the AWR report:

  • Exadata Outlier Summary — displays a summary of outliers identified in the various outlier sub-sections.
  • Exadata OS Statistics Outliers — contains outlier sub-sections for cells and disks based on OS statistics including, IOPS, throughput (MB/s), utilization percentage, service time, wait time, and CPU utilization per cell.
  • Exadata Cell Server Statistics — contains outlier sub-sections for cells and disks based on cell disk metrics, including IOPS, throughput (MB/s), and latency. The statistics are further categorized by I/O type; that is, small read, small write, large read, or large write.
  • Exadata Outlier Details — displays detailed information for the identified outliers, along with other statistics related to the outlier.
  • Exadata OS Statistics Top — contains 'Top N' sub-sections for cells and disks based on OS statistics, including IOPS, latency, and CPU utilization.
  • Exadata Cell Server Statistics Top — contains 'Top N' sub-sections for cells and disks based on cell disk metrics, including IOPS, throughput (MB/s), and latency. The statistics are further categorized by I/O type; that is, small read, small write, large read, or large write.

The following example shows two of the Exadata Cell Server Statistics outlier sub-sections in the AWR report. The example highlights that hard disk throughput (IOPS) exceeds the expected maximum, and that a specific disk that is performing substantially more small reads than other disks.

Figure 6-27 AWR Report: Exadata Cell Server IOPS Statistics Outliers

The image shows an example of the Exadata Cell Server IOPS Statistics - Outlier Cells and Exadata Cell Server IOPS Statistics - Outlier Disks sections in the AWR report.

Exadata IO Reasons

When the database sends an I/O request to Exadata, the request is tagged with information that includes the reason for the I/O. This information is aggregated in the Exadata IO Reasons sections of the AWR report and allows you to understand the reasons for performing I/O.

The AWR report contains sub-sections that display the Top IO Reasons by Requests and the Top IO Reasons by MB (throughput). Later versions of the AWR report, further break down the Top IO Reasons by Requests to categorize read requests from flash, write requests to flash, read requests from disk, and write requests to disk.

The following example shows Top IO Reasons by Requests. The example output is typical of a well-performing system, with a high proportion of I/O that is associated with Smart Scan, and similar I/O profiles across all of the storage servers.

Figure 6-28 AWR Report: Top IO Reasons by Request

The image shows an example of the Top IO Reasons by Request section in the AWR report.

Internal IO Reasons

If Internal IO is among the Top IO Reasons reported, then the AWR report will include a section that summarizes the Internal IO Reasons:

Figure 6-29 AWR Report: Internal IO Reasons

The image shows an example of the Internal IO Reasons section in the AWR report.

Possible causes of internal I/O include:

  • Disk Writer reads — results in flash reads when the disk writer reads from Exadata Smart Flash Cache in Write-Back mode to persist data to disk. For further details, review Flash Cache Internal Reads in the AWR report.
  • Disk Writer writes — results in disk writes when the disk writer persists data to disk from Exadata Smart Flash Cache in Write-Back mode. For further details, review Flash Cache Internal Reads in the AWR report.
  • Population — results in flash writes when requested data is read into Exadata Smart Flash Cache. When the data is read from disk, it also populates Exadata Smart Flash Cache. This is often correlated with flash cache misses. For further details, review Flash Cache User Reads and Flash Cache Internal Writes in the AWR report.
  • Metadata — results in flash writes when new data is written to Exadata Smart Flash Cache in Write-Back mode. This is often due to first writes. For further details, review Flash Cache User Writes in the AWR report.

The specific causes listed in this section are subject to the version of Oracle Database being used.

6.3.9.2 Monitoring Cell Disk I/O Using Database Statistics and Wait Events

The following table describes various database statistics that are useful for monitoring cell disk I/O. The statistics are available in various dynamic performance views, including V$SYSSTAT, and may be displayed in the Global Activity Statistics or Instance Activity Statistics section of an AWR report.

Statistic Description
physical read IO requests Number of physical read requests issued by the database.
physical read requests optimized Total number of read requests satisfied by using Exadata Smart Flash Cache, and read requests avoided by using storage index or columnar cache.
physical read total bytes Total amount of I/O bytes for reads issued by the database, whether or not the read was offloaded to the storage servers.
physical read total bytes optimized Total number of bytes read from Exadata Smart Flash Cache, and bytes avoided by using storage index or columnar cache.
physical read total IO requests Total number of read requests issued by the database for all activity including application, backup, recovery, and other utilities.
physical write IO requests Number of physical write requests issued by the database.
physical write total bytes Total number of IO bytes for writes issued by the database for all activity.
physical write total bytes optimized Total number of bytes written to Exadata Smart Flash Cache. These bytes are synchronized to disk in a lazy manner.
physical write total IO requests Total number of write requests issued by the database for all activity, including application, backup, recovery and other utilities.

Additional database I/O statistics are also contained in the following V$IOSTAT views, which are also used in the IOStat sections of the AWR report.

  • V$IOSTAT_FILE — displays information about disk I/O statistics for database files, including data files, temp files, and other types of database files.
  • V$IOSTAT_FUNCTION — displays disk I/O statistics for database functions, such as the LGWR and DBWR.
  • V$IOSTAT_FUNCTION_DETAIL — displays disk I/O statistics for database functions, broken down by file type.

The following table describes database wait events that are useful for monitoring cell disk I/O. The wait events are visible in various dynamic performance views, including V$SESSION, V$SYSTEM_EVENT and V$SESSION_EVENT, and may be displayed in the Wait Event sections of the AWR report.

Wait Event Description
cell list of blocks physical read

This wait event occurs during recovery or during buffer pre-fetching (rather than performing multiple single-block reads). It is used to monitor database blocks that must be changed as part of recovery and are read in parallel for the database.

In V$SESSION, records associated with this event include additional parameters:

  • P1 contains the relevant storage server hash number, which corresponds to V$CELL.CELL_HASHVAL.

  • P2 identifies the disk hash number for the grid disk that contains the data, which corresponds to V$ASM_DISK.HASH_VALUE.

  • P3 specifies the number of bytes processed during the I/O read operation.

This wait event is equivalent to db file parallel read for a cell.

cell list of blocks read request

This is a placeholder wait event associated with cell list of blocks physical read, which is visible only during the wait period. After the wait event ends, the placeholder is typically converted to cell list of blocks physical read.

cell multiblock physical read

This wait event represents the time taken to perform all the I/Os for a multi-block database read.

In V$SESSION, records associated with this event include additional parameters:

  • P1 contains the relevant storage server hash number, which corresponds to V$CELL.CELL_HASHVAL.

  • P2 identifies the disk hash number for the grid disk that contains the data, which corresponds to V$ASM_DISK.HASH_VALUE.

  • P3 specifies the number of bytes processed during the I/O read operation.

This wait event is equivalent to db file scattered read for a cell.

cell multiblock read request

This is a placeholder wait event associated with cell multiblock physical read, which is visible only during the wait period. After the wait event ends, the placeholder is typically converted to cell multiblock physical read.

cell single block physical read

This wait event represents the time taken to perform a single block database I/O.

In V$SESSION, records associated with this event include additional parameters:

  • P1 contains the relevant storage server hash number, which corresponds to V$CELL.CELL_HASHVAL.

  • P2 identifies the disk hash number for the grid disk that contains the data, which corresponds to V$ASM_DISK.HASH_VALUE.

  • P3 specifies the number of bytes processed during the I/O read operation.

This wait event is equivalent to db file sequential read for a cell.

Commencing with the May 2022 Oracle Database release updates (versions 19.15.0.0.220419, 21.6.0.0.220419, and later), this wait event no longer includes I/O from Exadata Smart Flash Cache or database I/O using a Remote Direct Memory Access (RDMA) read.

cell single block physical read: flash cache

This wait event represents the time taken to perform a single block database I/O from Exadata Smart Flash Cache.

In V$SESSION, records associated with this event include additional parameters:

  • P1 contains the relevant storage server hash number, which corresponds to V$CELL.CELL_HASHVAL.

  • P2 identifies the disk hash number for the grid disk that contains the data (not the cache location), which corresponds to V$ASM_DISK.HASH_VALUE.

  • P3 specifies the number of bytes processed during the I/O read operation.

This wait event was introduced in the May 2022 Oracle Database release updates and is present in Oracle Database versions 19.15.0.0.220419, 21.6.0.0.220419, and later. Previously, cell single block physical read included these waits.

cell single block physical read: xrmem cache

cell single block physical read: pmem cache

This wait event represents the time taken to perform a single block database I/O from XRMEM cache.

In V$SESSION, records associated with this event include additional parameters:

  • P1 contains the relevant storage server hash number, which corresponds to V$CELL.CELL_HASHVAL.

  • P2 identifies the disk hash number for the grid disk that contains the data (not the cache location), which corresponds to V$ASM_DISK.HASH_VALUE.

  • P3 specifies the number of bytes processed during the I/O read operation.

cell single block physical read: pmem cache was introduced in the May 2022 Oracle Database release updates and is present in Oracle Database versions 19.15.0.0.220419, 21.6.0.0.220419, and later. Previously, cell single block physical read included these waits.

The wait event is named cell single block physical read: xrmem cache in Oracle Database versions that support Oracle Exadata System Software release 23.1.0 and later.

cell single block physical read: RDMA

This wait event represents the time taken to perform a single block database I/O using a Remote Direct Memory Access (RDMA) read.

In V$SESSION, records associated with this event include additional parameters:

  • P1 contains the relevant storage server hash number, which corresponds to V$CELL.CELL_HASHVAL.

  • P2 identifies the disk hash number for the grid disk that contains the data, which corresponds to V$ASM_DISK.HASH_VALUE.

  • P3 specifies the number of bytes processed during the I/O read operation.

This wait event was introduced in the May 2022 Oracle Database release updates and is present in Oracle Database versions 19.15.0.0.220419, 21.6.0.0.220419, and later. Previously, cell single block physical read included these waits.

cell single block read request

This is a placeholder wait event associated with a single block database I/O that is visible only during the wait period. After the wait event ends, the placeholder is converted to the appropriate wait event, which is typically one of the cell single block physical read events.

cell interconnect retransmit during physical read

This wait event appears during retransmission for an I/O of a single-block or multiblock read. The cell hash number in the P1 column in the V$SESSION_WAIT view is the same cell identified for cell single block physical read and cell multiblock physical read. The P2 column contains the subnet number to the cell, and the P3 column contains the number of bytes processed during the I/O read operation.

The availability of a specific statistic or wait event is subject to the version of Oracle Database being used.

6.3.9.3 Monitoring Cell Disk I/O Using Exadata Metrics

Cell disk metrics provide information about the I/O load for cell disks, such as the number of large blocks read from a cell disk.

Cell disk metrics are identified in the METRICCURRENT, METRICDEFINITION, and METRICHISTORY objects as having objectType=CELLDISK.

The cell disk for the metric is specified by the metricObjectName attribute of the METRICCURRENT and METRICHISTORY objects.

For cumulative metrics, the metric value for a specific time period is determined by subtracting values from different collectionTime periods. For rate metrics, the time period for the metric value is over the previous minute.

Example 6-12 Displaying Cell Disk Metric Definitions

This example shows how to display the cell disk metric definitions that are available in the Oracle Exadata System Software.

CellCLI> LIST METRICDEFINITION ATTRIBUTES NAME,DESCRIPTION WHERE OBJECTTYPE = CELLDISK
         CD_BY_FC_DIRTY          "Number of unflushed megabytes cached in FLASHCACHE on a cell disk"
         CD_IO_BY_R_LG           "Number of megabytes read in large blocks from a cell disk"
         CD_IO_BY_R_LG_SEC       "Number of megabytes read in large blocks per second from a cell disk"
         CD_IO_BY_R_SCRUB        "Number of megabytes read from a cell disk by the scrubbing job"
         CD_IO_BY_R_SCRUB_SEC    "Number of megabytes read per second from a cell disk by the scrubbing job"
         CD_IO_BY_R_SM           "Number of megabytes read in small blocks from a cell disk"
         CD_IO_BY_R_SM_SEC       "Number of megabytes read in small blocks per second from a cell disk"
         CD_IO_BY_W_LG           "Number of megabytes written in large blocks to a cell disk"
         CD_IO_BY_W_LG_SEC       "Number of megabytes written in large blocks per second to a cell disk"
         CD_IO_BY_W_SM           "Number of megabytes written in small blocks to a cell disk"
         CD_IO_BY_W_SM_SEC       "Number of megabytes written in small blocks per second to a cell disk"
         CD_IO_ERRS              "Number of IO errors on a cell disk"
         CD_IO_ERRS_MIN          "Number of IO errors on a cell disk per minute"
         CD_IO_ERRS_SCRUB        "Number of IO errors hit by the scrubbing job on a cell disk"
         CD_IO_LOAD              "Average I/O load for the cell disk"
         CD_IO_RQ_R_LG           "Number of requests to read large blocks from a cell disk"
         CD_IO_RQ_R_LG_SEC       "Number of requests to read large blocks per second from a cell disk"
         CD_IO_RQ_R_SCRUB        "Number of requests to read from a cell disk by the scrubbing job"
         CD_IO_RQ_R_SCRUB_SEC    "Number of requests to read per second from a cell disk by the scrubbing job"
         CD_IO_RQ_R_SM           "Number of requests to read small blocks from a cell disk"
         CD_IO_RQ_R_SM_SEC       "Number of requests to read small blocks per second from a cell disk"
         CD_IO_RQ_W_LG           "Number of requests to write large blocks to a cell disk"
         CD_IO_RQ_W_LG_SEC       "Number of requests to write large blocks per second to a cell disk"
         CD_IO_RQ_W_SM           "Number of requests to write small blocks to a cell disk"
         CD_IO_RQ_W_SM_SEC       "Number of requests to write small blocks per second to a cell disk"
         CD_IO_ST_RQ             "Average service time per request for small IO requests to a cell disk"
         CD_IO_TM_R_LG           "Cumulative latency of reading large blocks from a cell disk"
         CD_IO_TM_R_LG_RQ        "Average latency of reading large blocks per request to a cell disk"
         CD_IO_TM_R_SM           "Cumulative latency of reading small blocks from a cell disk"
         CD_IO_TM_R_SM_RQ        "Average latency of reading small blocks per request from a cell disk"
         CD_IO_TM_W_LG           "Cumulative latency of writing large blocks to a cell disk"
         CD_IO_TM_W_LG_RQ        "Average latency of writing large blocks per request to a cell disk"
         CD_IO_TM_W_SM           "Cumulative latency of writing small blocks to a cell disk"
         CD_IO_TM_W_SM_RQ        "Average latency of writing small blocks per request to a cell disk"
         CD_IO_UTIL              "Percentage of disk resources utilized for the cell disk"
         CD_IO_UTIL_LG           "Percentage of disk resources utilized by large requests for the cell disk"
         CD_IO_UTIL_SM           "Percentage of disk resources utilized by small requests for the cell disk"

Note the following additional details:

  • CD_IO_LOAD represents the average I/O load for the cell disk.

    I/O load specifies the length of the disk queue. It is similar to iostat's avgqu-sz, but I/O load is a weighted value depending on the type of disk:

    • For hard disks, a large I/O has three times the weight of a small I/O.
    • For flash disks, large and small I/Os have the same weight.

    Corresponding metrics are also available for each database (DB_IO_LOAD), pluggable database (PDB) (PDB_IO_LOAD), IORM category (CT_IO_LOAD), and consumer group (CG_IO_LOAD).

  • CD_IO_ST_RQ cannot be used for flash devices, as explained in Using IOSTAT.

  • CD_IO_UTIL is similar to %util of iostat for hard disks, but not for flash devices as mentioned in Using IOSTAT. For flash devices, it is a percentage of the maximum I/O bandwidth for your system, as specified in the product data sheet.

    Because this metric is computed by IORM, it is also available per database, PDB, and consumer group.

  • The unit of measurement for the latency metrics is microseconds. The latency metrics have metric names starting with CD_IO_TM_.
6.3.9.4 What to Look For When Monitoring Cell Disk I/O

Imbalances

In an Exadata environment, an even load distribution is expected across all cells or disks. However, there are situations that may cause an imbalance.

A load imbalance may by due to characteristics of the workload, such as:

  • Repeated small-table scans — This is often caused by a table on the right side of a nested loop join. Since the small table may only reside on a few disks or cells, repeated access means reading from a small set of devices, which may be flash devices if the data resides in Exadata Smart Flash Cache. To address the imbalance, you can identify affected SQL statements and review their execution plans. You can also consider the physical organization of the affected segments.

  • Repeated control file reads — Control file reads may be caused by queries against some database dynamic performance views. The control file is small and may only reside on a few disks or cells, so repeated access means reading from a small set of devices, usually flash. Repeated control file reads are evident in the IOStat by File Type section of the AWR report and using the control file sequential read wait event.

    To understand the cause of repeated control file reads, you can use the Exadata IO Reasons section of the AWR report to identify cells servicing many control file reads and correlate this with statistics from the Top Databases section to identify databases issuing the control file reads. You can also review Active Session History (ASH) to identify SQL statements that are experiencing waits for control file sequential read.

    Beware that control file reads may account for a disproportionate amount of small I/O on a lightly-loaded system. In this case, you can safely ignore the imbalance.

  • Repeated reads of a small segment, such as a LOB segment — To address this kind of imbalance, review database statistics for segment, or examine ASH to identify the responsible SQL commands. Then, examine the SQL and the surrounding application logic to see if the repeated reads can be avoided.

  • Repeated ASM metadata access — This is often caused by database queries related to space usage, which require access to ASM metadata. The ASM metadata is small and may only reside on a few disks or cells, so repeated access may show up as an imbalance. This can show up in the Exadata IO Reasons sections of the AWR report with reasons that are prefixed with ASM, such as ASM CACHE IO. You can use the Exadata IO Reasons section to identify affected cells and correlate this with statistics from the Top Databases sections to confirm ASM as the source of the I/O. To address the imbalance, review the need for and frequency of the ASM space usage queries that access the metadata.

Apart from workload-related causes, imbalances may also be caused by:

  • Uneven data distribution — Hot spots may occur when some disks contain more or less data than others. To check for balanced data distribution, you can query the V$ASM_DISK_IOSTAT view before and after running a SQL query that contains large full-table scans. In this case, the statistics in the read column and the read_bytes column should be approximately the same for all disks in the disk group. You can also check for balanced data distribution by using the script available in My Oracle Support document 367445.1.
  • Asymmetric grid disk configurations — If the grid disks are sized differently on different cells, then different amounts of data may reside on each cell, resulting in imbalance due to uneven data distribution. For this reason, symmetric configuration is recommended across all storage servers in a system.
  • Failures or recovery from failures — Some processing cannot spread evenly across the remaining disks or cells. For example, if a flash card fails, the cell containing the card has less available flash cache space, and this may show up as an imbalance when the flash cache statistics are compared with the other cells.

An imbalance often shows up in different statistics across the AWR report. By correlating different sections in the AWR report you may gain a deeper understanding of the imbalance.

High I/O Load

When database performance issues are related to I/O load on the Exadata storage servers, typically there will be increased latencies in the I/O related wait events, and increased database time in the user I/O or system I/O wait classes. When dealing with high I/O load, first understand the composition of the I/O load. Based on the composition of the I/O load you my be directed to different statistics to gain a deeper understanding.

In the AWR report, a good starting point is the Disk Activity section, which provides a high-level summary for potential sources of disk activity. The Disk Activity section is located in the AWR report under Exadata Statistics > Performance Summary. See Disk Activity.

Internal I/O

In the AWR report, if internal I/O is reported among the Top IO Reasons, then the AWR report also includes a section that summarizes the Internal IO Reasons. Use this section to understand the composition of internal I/O. Based on the composition of the internal I/O load you my be directed to different statistics to gain a deeper understanding. See Internal IO Reasons.

6.3.10 Monitoring Grid Disks

Grid disk metrics provide information about the I/O load for grid disks, such as the number of large blocks read from a grid disk.

Grid disk metrics are identified in the METRICCURRENT, METRICDEFINITION, and METRICHISTORY objects as having objectType=GRIDDISK.

The grid disk for the metric is specified by the metricObjectName attribute of the METRICCURRENT and METRICHISTORY objects.

For cumulative metrics, the metric value for a specific time period is determined by subtracting values from different collectionTime periods. For rate metrics, the time period for the metric value is over the previous minute.

Example 6-13 Displaying Grid Disk Metric Definitions

This example shows how to display the grid disk metric definitions that are available in the Oracle Exadata System Software.

CellCLI> LIST METRICDEFINITION ATTRIBUTES NAME,DESCRIPTION WHERE OBJECTTYPE = GRIDDISK
         GD_BY_FC_DIRTY                  "Number of unflushed megabytes cached in FLASHCACHE for a grid disk"
         GD_IO_BY_R_LG                   "Number of megabytes read in large blocks from a grid disk"
         GD_IO_BY_R_LG_SEC               "Number of megabytes read in large blocks per second from a grid disk"
         GD_IO_BY_R_SCRUB                "Number of megabytes read from a grid disk by the scrubbing job"
         GD_IO_BY_R_SCRUB_SEC            "Number of megabytes read per second from a grid disk by the scrubbing job"
         GD_IO_BY_R_SM                   "Number of megabytes read in small blocks from a grid disk"
         GD_IO_BY_R_SM_SEC               "Number of megabytes read in small blocks per second from a grid disk"
         GD_IO_BY_W_LG                   "Number of megabytes written in large blocks to a grid disk"
         GD_IO_BY_W_LG_SEC               "Number of megabytes written in large blocks per second to a grid disk"
         GD_IO_BY_W_SM                   "Number of megabytes written in small blocks to a grid disk"
         GD_IO_BY_W_SM_SEC               "Number of megabytes written in small blocks per second to a grid disk"
         GD_IO_ERRS                      "Number of IO errors on a grid disk"
         GD_IO_ERRS_MIN                  "Number of IO errors on a grid disk per minute"
         GD_IO_ERRS_SCRUB                "Number of IO errors hit by the scrubbing job on a grid disk"
         GD_IO_RQ_R_LG                   "Number of requests to read large blocks from a grid disk"
         GD_IO_RQ_R_LG_SEC               "Number of requests to read large blocks per second from a grid disk"
         GD_IO_RQ_R_SCRUB                "Number of requests to read from a grid disk by the scrubbing job"
         GD_IO_RQ_R_SCRUB_SEC            "Number of requests to read per second from a grid disk by the scrubbing job"
         GD_IO_RQ_R_SM                   "Number of requests to read small blocks from a grid disk"
         GD_IO_RQ_R_SM_SEC               "Number of requests to read small blocks per second from a grid disk"
         GD_IO_RQ_W_LG                   "Number of requests to write large blocks to a grid disk"
         GD_IO_RQ_W_LG_SEC               "Number of requests to write large blocks per second to a grid disk"
         GD_IO_RQ_W_SM                   "Number of requests to write small blocks to a grid disk"
         GD_IO_RQ_W_SM_SEC               "Number of requests to write small blocks per second to a grid disk"
         GD_NVM_READ_RETRIES             "Number of read retries for a non-volatile memory (NVM) grid disk"
         GD_SP_BY_ALLOCATED              "Allocated physical space for grid disk in bytes"
         GD_SP_IO_BY_PARTIAL             "Bytes returned by partial IOs"
         GD_SP_IO_BY_PARTIAL_SEC         "Bytes returned by partial IOs per second"
         GD_SP_IO_BY_REDIRECTED          "Sparse bytes redirected to original data block"
         GD_SP_IO_BY_REDIRECTED_SEC      "Sparse bytes redirected to original data block per second"
         GD_SP_IO_RQ_PARTIAL             "Number of IO requests that returned partial data"
         GD_SP_IO_RQ_PARTIAL_SEC         "Number of IO requests that returned partial data per second"
         GD_SP_PRCT_ALLOCATED            "Allocated physical space for grid disk by percentage"

Note the following additional details:

  • GD_SP_PRCT_ALLOCATED tracks the allocated physical space for the grid disk, and is presented as a percentage. By default, a warning alert is generated when space usage is more than 95%, and a critical alert is generated when space usage reaches 99%. To lower the space usage, you can increase the size of the grid disk, or drop some of the data on the grid disk.

6.3.11 Monitoring Host Interconnect Metrics

Host interconnect metrics provide information about the I/O transmission for hosts that access cell storage.

Host interconnect metrics are identified in the METRICCURRENT, METRICDEFINITION, and METRICHISTORY objects as having objectType=HOST_INTERCONNECT.

The host interconnect for the metric is specified by the metricObjectName attribute of the METRICCURRENT and METRICHISTORY objects. For cumulative metrics, the metric value for a specific time period can be determined by subtracting values from different collectionTime periods. For rate metrics, the time period for the metric value is over the previous minute.

Example 6-14 Displaying Host Interconnect Metric Definitions

This example shows how to display the host interconnect metric definitions that are available in the Oracle Exadata System Software.

CellCLI> LIST METRICDEFINITION ATTRIBUTES NAME,DESCRIPTION WHERE OBJECTTYPE = HOST_INTERCONNECT
         N_MB_DROP               "Number of megabytes droped during  transmission  to  a particular host"
         N_MB_DROP_SEC           "Number of megabytes droped during transmission  per second  to  a particular host"
         N_MB_RDMA_DROP          "Number of megabytes dropped during RDMA transmission to  a particular host"
         N_MB_RDMA_DROP_SEC      "Number of megabytes dropped during RDMA transmission per second  to  a particular host"
         N_MB_RECEIVED           "Number of megabytes received from a particular host"
         N_MB_RECEIVED_SEC       "Number of megabytes per second received from a particular host"
         N_MB_RESENT             "Number of megabytes resent  to  a particular host"
         N_MB_RESENT_SEC         "Number of megabytes resent per second to  a particular host"
         N_MB_SENT               "Number of megabytes transmitted to  a particular host"
         N_MB_SENT_SEC           "Number of megabytes transmitted per second to  a particular host"
         N_RDMA_RETRY_TM         "Latency of the retry actions during RDMA transmission to a particular host"

6.4 Using Exadata Monitoring Objects

6.4.1 Displaying Metrics

You can use the CellCLI LIST command to display and monitor metrics for cell objects.

You can display metric definitions, current metrics, and metric history using the LIST command.

6.4.1.1 Displaying Metric Definitions

Use the LIST METRICDEFINITION command to display the metric definitions for a storage server.

A metric definition listing shows the configuration of a metric.

Example 6-15 Displaying Metric Definitions

This example shows how to display attributes for the METRICDEFINITION object.

CellCLI> LIST METRICDEFINITION CL_CPUT DETAIL

CellCLI> LIST METRICDEFINITION WHERE objectType = 'GRIDDISK'

CellCLI> LIST METRICDEFINITION WHERE name LIKE 'CD_IO_RQ.*' -
         ATTRIBUTES name, metricType, description

Related Topics

6.4.1.2 Displaying Current Metrics

Use the LIST METRICCURRENT command to display the current metric values for a storage server.

A current metric listing shows a set of observations on the current value of an individual metric.

Example 6-16 Displaying Current Metric Values

This example shows how to display attributes for the METRICCURRENT object.

CellCLI> LIST METRICCURRENT CL_TEMP DETAIL

         name:                   CL_TEMP
         alertState:             normal
         collectionTime:         2009-12-17T15:32:25-08:00
         metricObjectName:       abcd2x3
         metricType:             Instantaneous
         metricValue:            48.0 C
         objectType:             CELL

CellCLI> LIST METRICCURRENT WHERE objectType = 'CELLDISK' AND                 - 
         metricValue != 0 ATTRIBUTES name, metricObjectName,                  -
         metricValue, collectionTime

         CD_IO_BY_R_LG   CD_00_abcd2x3    1.9 MB  2009-12-17T15:46:52-08:00
         CD_IO_BY_R_LG   CD_01_abcd2x3    1.0 MB  2009-12-17T15:46:52-08:00
         CD_IO_BY_R_LG   CD_02_abcd2x3    4.1 MB  2009-12-17T15:46:52-08:00
         CD_IO_BY_R_LG   CD_03_abcd2x3    9.5 MB  2009-12-17T15:46:52-08:00
         CD_IO_BY_R_LG   CD_04_abcd2x3    0.1 MB  2009-12-17T15:46:52-08:00
         CD_IO_BY_R_LG   CD_05_abcd2x3    0.4 MB  2009-12-17T15:46:52-08:00
         ...
6.4.1.3 Displaying Metric History

Use the LIST METRICHISTORY command to display the metric history for a storage server.

A metric history listing shows a collection of past individual observations of all metric values.

The retention period for most metric history files is specified by the metricHistoryDays cell attribute. The default retention period is 7 days. You can modify this setting with the CellCLI ALTER CELL command.

In addition to the metrics governed by the metricHistoryDays cell attribute, a subset of key metric observations are retained for up to one year. In all cases, historical metric observations are purged automatically if the server detects a storage space shortage.

Example 6-17 Displaying Metric History Values

This example shows various LIST METRICHISTORY commands.

The following command displays observations in the metric history for the CD_IO_RQ_R_LG metric when the metric was in a critical alert state. The output contains all of the metric attributes.

CellCLI> LIST METRICHISTORY CD_IO_RQ_R_LG WHERE alertState='critical' DETAIL

The following command displays a subset of metric attributes for all metric observations that are associated with cell disks. The output is limited to observations with non-zero metric values that were observed after the specified time.

CellCLI> LIST METRICHISTORY WHERE objectType = 'CELLDISK' AND metricValue != 0   -
         AND collectionTime > '2009-08-12T09:10:51-07:00' ATTRIBUTES name,      -
         metricObjectName, metricValue, collectionTime

The following command displays all metric observations within a specified time period that are associated with the specified object types .

CellCLI> LIST METRICHISTORY WHERE objectType LIKE 'CELLDISK|FLASHCACHE|FLASHLOG|SMARTIO|IORM_DATABASE' -
         AND collectionTime > '2020-07-15T08:00:00-07:00' -
         AND collectionTime < '2020-07-15T09:00:00-07:00'

6.4.2 Monitoring Alerts

You can monitor and receive notifications for alerts.

Alerts represent events of importance occurring within the cell. Typically, alerts indicate that Exadata server functionality is compromised or in danger of failure.

To receive notifications, use the ALTER CELL command.

Related Topics

6.4.2.1 Displaying Alert Definitions

Use the LIST ALERTDEFINITION command to display the alert definitions for the storage server.

An alert definition provides a definition for every alert that can be produced on a storage server.

Example 6-18 Listing Alert Definition Attributes

This example shows how to display a detailed list of attributes for the alert ADRAlert.

CellCLI> LIST ALERTDEFINITION ADRAlert DETAIL
         name:                   ADRAlert
         alertShortName:         ADR
         alertSource:            "Automatic Diagnostic Repository"
         description:            "Incident Alert"

Example 6-19 Listing Alert Definition Name and Description Attributes

You can display a list of specified attributes for an alert definition. This example shows how to display the alert name, metric name, and description. The metric name identifies the metric on which the alert is based. ADRAlert, HardwareAlert, Stateful_HardwareAlert, and Stateful_SoftwareAlert are not based on a metric, and therefore do not have metric names.

CellCLI> LIST ALERTDEFINITION ATTRIBUTES name, metricName, description 
    ADRAlert                                        "Incident Alert"
    HardwareAlert                                   "Hardware Alert"
    MetricAlert                                     "Threshold Alert"
    SoftwareAlert                                   "Software Alert"
    StatefulAlert_CD_IO_ERRS_MIN    CD_IO_ERRS_MIN  "Threshold Alert"
    StatefulAlert_CG_IO_RQ_LG       CG_IO_RQ_LG     "Threshold Alert"
    StatefulAlert_CG_IO_RQ_LG_SEC   CG_IO_RQ_LG_SEC "Threshold Alert"
    StatefulAlert_CG_IO_RQ_SM       CG_IO_RQ_SM     "Threshold Alert"
    StatefulAlert_CG_IO_RQ_SM_SEC   CG_IO_RQ_SM_SEC "Threshold Alert"
    StatefulAlert_CG_IO_WT_LG       CG_IO_WT_LG     "Threshold Alert"
    StatefulAlert_CG_IO_WT_LG_RQ    CG_IO_WT_LG_RQ  "Threshold Alert"
    StatefulAlert_CG_IO_WT_SM       CG_IO_WT_SM     "Threshold Alert"
    StatefulAlert_CG_IO_WT_SM_RQ    CG_IO_WT_SM_RQ  "Threshold Alert"
    StatefulAlert_CL_FSUT           CL_FSUT         "Threshold Alert"
    StatefulAlert_CL_MEMUT          CL_MEMUT        "Threshold Alert"
    StatefulAlert_CT_IO_RQ_LG       CT_IO_RQ_LG     "Threshold Alert"
    StatefulAlert_CT_IO_RQ_LG_SEC   CT_IO_RQ_LG_SEC "Threshold Alert"
    StatefulAlert_CT_IO_RQ_SM       CT_IO_RQ_SM     "Threshold Alert"
    StatefulAlert_CT_IO_RQ_SM_SEC   CT_IO_RQ_SM_SEC "Threshold Alert"
    StatefulAlert_CT_IO_WT_LG       CT_IO_WT_LG     "Threshold Alert"
    StatefulAlert_CT_IO_WT_LG_RQ    CT_IO_WT_LG_RQ  "Threshold Alert"
    StatefulAlert_CT_IO_WT_SM       CT_IO_WT_SM     "Threshold Alert"
    StatefulAlert_CT_IO_WT_SM_RQ    CT_IO_WT_SM_RQ  "Threshold Alert"
    StatefulAlert_DB_IO_RQ_LG       DB_IO_RQ_LG     "Threshold Alert"
    StatefulAlert_DB_IO_RQ_LG_SEC   DB_IO_RQ_LG_SEC "Threshold Alert"
    StatefulAlert_DB_IO_RQ_SM       DB_IO_RQ_SM     "Threshold Alert"
    StatefulAlert_DB_IO_RQ_SM_SEC   DB_IO_RQ_SM_SEC "Threshold Alert"
    StatefulAlert_DB_IO_WT_LG       DB_IO_WT_LG     "Threshold Alert"
    StatefulAlert_DB_IO_WT_LG_RQ    DB_IO_WT_LG_RQ  "Threshold Alert"
    StatefulAlert_DB_IO_WT_SM       DB_IO_WT_SM     "Threshold Alert"
    StatefulAlert_DB_IO_WT_SM_RQ    DB_IO_WT_SM_RQ  "Threshold Alert"
    StatefulAlert_GD_IO_ERRS_MIN    GD_IO_ERRS_MIN  "Threshold Alert"
    Stateful_HardwareAlert                          "Hardware Stateful Alert"
    Stateful_SoftwareAlert                          "Software Stateful Alert"
6.4.2.2 Receiving Alert Notifications

Administrators for Oracle Exadata System Software can receive alert notifications by email or by Simple Network Management Protocol (SNMP) trap alerts.

By using the ALTER CELL command, you can set the alert notification attributes to configure Oracle Exadata Storage Server to send notification email messages or SNMP trap alerts.

Use of SNMP alerts allows Oracle Exadata Storage Servers to be monitored by a management application, such as Oracle Enterprise Manager Cloud Control.

Note:

The SNMP alerts conform to a MIB (management information base) which is included with each installation of Oracle Exadata System Software. The MIB file on Oracle Exadata Storage Server is available at /opt/oracle/cell/cellsrv/deploy/config/cell_alert.mib.

6.4.2.3 Displaying Alert History

Use the LIST ALERTHISTORY command to display the alert history that has occurred on a cell.

Alert history entries are retained for a maximum of 100 days. If the number of alert history entries exceeds 500, then the alert history entries are only retained for 7 days. When stateful alerts are cleared, meaning that the underlying metric, hardware or software condition has returned to normal, then the stateful alert is retained either 100 or 7 days, depending on the number of alert history entries. Stateful alerts that are not cleared are retained, regardless of their age.

Example 6-20 Listing Alert History Attributes

This example shows how to display a detailed list of attributes for alert history entries where the severity attribute is set to critical and the examinedBy attribute has not been set.

CellCLI> LIST ALERTHISTORY WHERE severity = 'critical' AND examinedBy = '' DETAIL
6.4.2.4 Modifying Alert History

Use the ALTER ALERTHISTORY command to update the alert history for the cell.

Example 6-21 Altering Alert History Attributes

This example shows how to set the examinedBy attribute to the user ID of the administrator that examined the alert.

CellCLI> ALTER ALERTHISTORY 1671443714 examinedBy="jdoe"

The examinedBy attribute is the only ALERTHISTORY attribute that can be modified.

6.4.3 Displaying Active Requests

Use the LIST ACTIVEREQUEST command to display the active requests for the storage server.

To view the ACTIVEREQUEST attributes, use the DESCRIBE ACTIVEREQUEST command.

Example 6-22 Listing Active Request Attributes

This example shows how to display a detailed list of attributes for a specified request I/O type.

CellCLI> LIST ACTIVEREQUEST WHERE IoType = 'predicate pushing' DETAIL

6.4.4 Using Real-Time Insight

You can use the Real-Time Insight feature to enable real-time monitoring of your Exadata systems using an external metric collection platform.

6.4.4.1 Using Fine-Grained Metrics

Traditionally, Exadata metric collections occur at 1-minute intervals. However, real-time monitoring requires more timely metrics. Commencing with Oracle Exadata System Software 22.1.0, you can configure fine-grained metric collection.

Fine-grained metric collection is the foundation for the Real-Time Insight feature. Fine-grained metric collection works in conjunction with an external metric collection platform, where the fine-grained metric observations are collected and processed for visualization. Fine-grained metric observations reside only in server memory and are not persisted on the server. Consequently, no fine-grained metric history is maintained on each server.

6.4.4.1.1 Controlling Fine-Grained Metric Collection

The metricFGCollIntvlInSec attribute controls fine-grained metric collection.

  • To enable fine-grained metric collection, you must set the collection interval to a value between 1 and 60 seconds.

    For example:

    CellCLI> ALTER CELL metricFGCollIntvlInSec=1

    The metricFGCollIntvlInSec setting is related to the automatic upload frequency specified in the metricStreamIntvlInSec attribute. When automatic metric upload and fine-grained collection are both enabled (metricStreamIntvlInSec>0 and metricFGCollIntvlInSec>0), metricStreamIntvlInSec must be between 5 and 30 times metricFGCollIntvlInSec. For example, if metricStreamIntvlInSec=60, then metricFGCollIntvlInSec must be between 2 and 12.

  • To disable fine-grained metric collection on a server, set metricFGCollIntvlInSec=0.

    For example:

    CellCLI> ALTER CELL metricFGCollIntvlInSec=0

    Fine-grained metric collection can be disabled only when automatic metric upload is disabled (metricStreamIntvlInSec=0) or the automatic upload frequency is between 5 and 30 minutes (metricStreamIntvlInSec is between 300 and 1800).

6.4.4.1.2 Customizing Fine-Grained Metric Collection

By default, a set of key performance metrics is automatically enabled for fine-grained collection. But, you can customize fine-grained metric collection by enabling or disabling specific metrics.

  • To enable a metric for fine-grained collection, use the ALTER METRICDEFINITION command and specify finegrained=enabled. For example:

    CellCLI> ALTER METRICDEFINITION N_NIC_KB_TRANS_SEC finegrained=enabled
    CellCLI> ALTER METRICDEFINITION N_MB_SENT,N_MB_RECEIVED finegrained=enabled
    CellCLI> ALTER METRICDEFINITION finegrained=enabled WHERE name LIKE 'N_NIC.*'
  • To disable a metric for fine-grained collection, use the ALTER METRICDEFINITION command and specify finegrained=disabled. For example:

    CellCLI> ALTER METRICDEFINITION N_MB_SENT finegrained=disabled
  • At any time, to view the metrics enabled for fine-grained collection, use the following command:

    CellCLI> LIST METRICDEFINITION WHERE finegrained=enabled
  • At any time, to view all of the metric definition details, including each metric description and whether the metric is enabled for fine-grained collection, use the following command:

    CellCLI> LIST METRICDEFINITION DETAIL
6.4.4.2 Uploading Metric Observations to a Collection Platform

You can enable an Exadata server to automatically upload (push) metric observations to an external metric collection platform.

6.4.4.2.1 Controlling the Automatic Metric Upload Frequency

The metricStreamIntvlInSec attribute sets the upload interval (in seconds) for automatic uploads to the metric streaming endpoints specified by the metricStreamEndPoint attribute.

  • To enable automatic metric uploads, set the metricStreamIntvlInSec attribute to a non-zero value.

    For example:

    CellCLI> ALTER CELL metricStreamIntvlInSec=25

    The metricStreamIntvlInSec setting is related to the fine-grained collection frequency specified in the metricFGCollIntvlInSec attribute:

    • When automatic metric upload and fine-grained collection are both enabled (metricStreamIntvlInSec>0 and metricFGCollIntvlInSec>0), metricStreamIntvlInSec must be between 5 and 30 times metricFGCollIntvlInSec. For example, if metricFGCollIntvlInSec is set to 5, then metricStreamIntvlInSec must be between 25 and 150.

    • When automatic metric upload is enabled and fine-grained collection is disabled (metricStreamIntvlInSec>0 and metricFGCollIntvlInSec=0), the automatic upload frequency must be between 5 and 30 minutes (metricStreamIntvlInSec must be between 300 and 1800).

  • To disable automatic metric uploads, set metricStreamIntvlInSec=0.

    For example:

    CellCLI> ALTER CELL metricStreamIntvlInSec=0
6.4.4.2.2 Customizing the Metric Stream

By default, a set of key performance metrics is automatically enabled for streaming. But, you can customize the metric stream by enabling or disabling specific metrics.

  • To include a metric in the metric stream, use the ALTER METRICDEFINITION command and specify streaming=enabled. For example:

    CellCLI> ALTER METRICDEFINITION N_NIC_KB_TRANS_SEC streaming=enabled
    CellCLI> ALTER METRICDEFINITION N_MB_SENT,N_MB_RECEIVED streaming=enabled
    CellCLI> ALTER METRICDEFINITION streaming=enabled WHERE name LIKE 'N_NIC.*'
  • To remove a metric from the metric stream, use the ALTER METRICDEFINITION command and specify streaming=disabled. For example:

    CellCLI> ALTER METRICDEFINITION N_MB_SENT streaming=disabled
  • At any time, to view the metrics that are included in the metric stream, use the following command:

    CellCLI> LIST METRICDEFINITION WHERE streaming=enabled
  • At any time, to view all of the metric definition details, including each metric description and whether the metric is included in the metric stream, use the following command:

    CellCLI> LIST METRICDEFINITION DETAIL
6.4.4.2.3 Configuring the Endpoints for Automatic Metric Upload

You can automatically upload (push) the metric stream to one or more collection endpoints by setting the metricStreamEndPoint attribute as follows:

metricStreamEndPoint[+]=((host="endpoint-URL"[,type="stream-format"][,token="authentication-token"][,{httpProxy|httpsProxy}="proxy-server"])
                          [,(host="endpoint-URL"[,type="stream-format"][,token="authentication-token"][,{httpProxy|httpsProxy}="proxy-server"])]...)

In the metricStreamEndPoint definition:

  • host: Specifies the URL for the collection endpoint. The URL can use HTTP or HTTPS.

  • type: Optionally specifies the format of the stream. Supported values are:

    • json: Provides the stream in a JSON format

    • plaintext: Provides the stream in a plain text format

    The default value is json.

  • token: Optionally specifies the authentication token for the collection endpoint. Consult the metric collection platform for details about generating the token.

  • httpProxy or httpsProxy: Optionally specifies a proxy server to facilitate network connectivity to the collection endpoint. A proxy server is required if a firewall resides between the Exadata system and the collection endpoint.

You can use the optional += operator to add collection endpoints to an existing metricStreamEndPoint definition. Otherwise, the = operator overwrites the previous attribute value.

Example 6-23 Setting Up a JSON Stream

This example shows how to set up a JSON stream. In the example command, the host and token values come from the collection platform and we assume that network connectivity is through the specified proxy.

CellCLI> ALTER CELL metricStreamEndPoint=((host="https://ingest.stream.example.com/v2/datapoint",type="json",token="wcfA_**********Z58QpKg",httpProxy="www-proxy.example.com:80"))

Example 6-24 Adding a Plain Text Endpoint

This example shows how to add a plain text endpoint to the existing metricStreamEndPoint. In the example command, the host and token values come from the collection platform. The example also assumes that the collection platform is within the corporate network requiring no network proxy.

CellCLI> ALTER CELL metricStreamEndPoint+=((host="http://idbsrv.example.com:8086/api/v2/write?org=Exadata&bucket=Metrics&precision=ns",type="plaintext",token="6unif**********rOXwtfkG0gWGENyePd6uN6OLR_deTZL4IuG9VTfDWwvpB-QvJcCcFs_NVjmpsyANz0Q8psA=="))
6.4.4.3 Downloading Metric Observations from an Exadata Server

You can download (pull) the metric stream from an Exadata server by using the provided REST endpoint. This includes all metrics enabled for streaming (streaming=enabled), regardless of whether the system is enabled for fine-grained metric collection or configured for automatic metric upload.

On each storage server, the REST endpoint URL is:

https://server-name/metricstream/list?stream=true

On each database server, the endpoint uses port 7879. Consequently, the URL is:

https://server-name:7879/metricstream/list?stream=true

For maximum efficiency, the download interval should be a multiple of the metric collection interval. If fine-grained metric collection is enabled (metricFGCollIntvlInSec>0) on the server, then coordinate the download frequency with the fine-grained collection interval. Otherwise, coordinate the download frequency with the standard 1-minute collection interval.

To facilitate access to the metric stream, you should use a dedicated user account, which only has access to the stream. You can use the following command sequence to appropriately configure a user account in CellCLI, which you can then use for authentication to the REST endpoint. In the command sequence, substitute your own user and role names.

CREATE ROLE metric_collector_role
GRANT PRIVILEGE LIST ON METRICSTREAM ALL ATTRIBUTES WITH ALL OPTIONS TO ROLE metric_collector_role
CREATE USER metric_collector PASSWORD=<password>
GRANT ROLE metric_collector_role TO USER metric_collector
6.4.4.4 Tagging Metric Observations

In each Exadata database server and storage server you can define a set of metric tags, which are included in every observation in the metric stream. These tags can help you to organize and group observations generated by numerous Exadata servers.

You can configure metric tags by setting the metricStreamTags attribute to a valid JSON string containing tag and value pairs as follows:

metricStreamTags='{"tag1":"value1"[,"tag2":"value2"]...}'

For example:

CellCLI> ALTER CELL metricStreamTags='{"application":"personnel","department":"HR"}'
6.4.4.5 Understanding the Metric Stream Format

Real-time metric observations contain a core set of attributes. However, the format of the metric stream depends on the mode of access.

If you are automatically uploading the metric stream to a metric collection platform, you can specify the metric stream format in the metric endpoint configuration. The available formats are JSON or plain text.

Following is an example of the JSON format:

json: {
    "gauge": [{
            "metric": "OS_NET_RX_BY_SEC",
            "value": "0.0012989044189453125",
            "timestamp": 1652473286000,
            "unit": "MB/sec",
            "dimensions": {
                "server": "celadm09.example.com",
                "objectName": "eth0",
                "nodeType": "STORAGE",
                "fleet": "example-fleet",
                "pod": "dbm01",
                "cluster": "c01"
            }
        }, {
            "metric": "SIO_IO_RD_FC_HD_SEC",
            "value": "0.0",
            "timestamp": 1652473286000,
            "unit": "MB/sec",
            "dimensions": {
                "server": "celadm09.example.com",
                "objectName": "SMARTIO",
                "nodeType": "STORAGE",
                "fleet": "example-fleet",
                "pod": "dbm01",
                "cluster": "c01"
            }
        }
    ]
}

The plain text format contains essentially the same information as the JSON stream. However, with the plain text format, each metric observation is presented on a separate line. Following is an example of the plain text format:

metrics,name=OS_NET_RX_BY_SEC,objectName=eth0,server=celadm09.example.com,unit=MB/sec,nodeType=STORAGE,fleet=example-fleet,pod=dbm01,cluster=c01 value=9.441184615324398E-4 1652473456000000000
metrics,name=OS_NET_RX_BY_SEC,objectName=eth0,server=celadm09.example.com,unit=MB/sec,nodeType=STORAGE,fleet=example-fleet,pod=dbm01,cluster=c01 value=0.002647613311980988 1652473457000000000

If you are downloading the metric stream by using the provided REST endpoint, the data is presented in a format similar to the plan text upload format where each metric observation is presented on a separate line. Following is an example of the download format:

DS_CPUT{objectName="dbadm05",unit="%",server="dbadm05.example.com",nodeType="KVMHOST",fleet="example-fleet",pod="dbm01",cluster="c01"} 23.10906363831155 1652485449597
DS_MEMUT{objectName="dbadm05",unit="%",server="dbadm05.example.com",nodeType="KVMHOST",fleet="example-fleet",pod="dbm01",cluster="c01"} 99 1652485449597
DS_MEMUT_MS{objectName="dbadm05",unit="%",server="dbadm05.example.com",nodeType="KVMHOST",fleet="example-fleet",pod="dbm01",cluster="c01"} 0.12396045794483294 1652485449597

The following list describes the attributes contained in the metric stream:

  • The metric name is identified as follows:

    • In the JSON upload format, the metric name follows the metric tag.

    • In the plain text upload format, the metric name is the value following name=.

    • In the download format, the metric name is the first element on each line, preceding the left braces ({).

  • The metric value is located as follows:

    • In the JSON upload format, the metric value follows the value tag.

    • In the plain text upload format, the metric value follows value=.

    • In the download format, the metric value is the second last element on each line, following the right braces (}).

  • The time of the metric observation is located as follows:

    • In the JSON upload format, the timestamp follows the timestamp tag. The timestamp is expressed as the number of milliseconds (1 x 10-3 sec) since January 1, 1970, 00:00:00 GMT.

    • In the plain text upload format, the timestamp is the last element on each line. The timestamp is expressed as the number of nanoseconds (1 x 10-9 sec) since January 1, 1970, 00:00:00 GMT.

    • In the download format, the timestamp is the last element on each line. The timestamp is expressed as the number of milliseconds (1 x 10-3 sec) since January 1, 1970, 00:00:00 GMT.

  • The unit value describes the unit of measurement for the metric observation.

  • The server value contains the name of the Exadata server that generated the metric observation.

  • The objectName value contains the name of the Exadata object associated with the metric.

  • The nodeType value contains the type of the Exadata server that generated the metric observation.

  • The fleet, pod, and cluster attributes are examples of user-defined metric tags, which you can use to organize and group observations generated by numerous Exadata servers. You can tag metrics by setting the metricStreamTags CELL attribute.