6.3.1.4 What to Look For When Monitoring Exadata Smart Flash Cache
Increased Read Latencies
Possible issues related to Exadata Smart Flash Cache tend to be visible in the database as increased read latencies,
specifically in the cell single block physical read
wait event. The
increased latency is usually caused when reads are issued against the hard disks
instead of being satisfied by Exadata Smart Flash Cache.
Read requests may not be satisfied from Exadata Smart Flash Cache when:
-
The required data is not in Exadata Smart Flash Cache.
In this case, requests are recorded as flash cache misses, which are visible as increased misses in the Flash Cache User Reads section of the AWR report. This is also typically accompanied by increased population writes, which are visible in the Flash Cache Internal Writes section of the AWR report.
On the database, you may see a reduction in the number of
cell flash cache read hits
compared with thephysical read IO requests
orphysical read total IO requests
. -
The data is not eligible for Exadata Smart Flash Cache.
To maximize cache efficiency, when an I/O request is sent from the database to the storage servers, it includes a hint that indicates whether or not the data should be cached in Exadata Smart Flash Cache.
If data is not eligible for caching, the corresponding I/O is visible in the Flash Cache User Reads - Skips section of the AWR report, along with the reason why the read was not eligible. Possible reasons include:
- The grid disk caching policy is set to
none
. - The database segment is configured with the
CELL_FLASH_CACHE NONE
storage option. - The I/O type and situation precludes caching. For example, the I/O is related to an RMAN backup operation and the hard disks are not busy. If this is a common occurrence, it will be evident in Top IO Reasons section of the AWR report
- The grid disk caching policy is set to
In some cases, the cell single block physical read
latency may
increase, with no apparent difference in Exadata Smart Flash Cache behavior. This can be caused by increased I/O load, especially
on hard disks.
Occasional long latencies in cell single block physical read
,
usually visible as a long tail in the cell single block physical
read
histogram may simply indicate that the data is read from hard
disk, and does not necessarily indicate a problem with Exadata Smart Flash Cache.
Skipped Writes
When using Exadata Smart Flash Cache in Write-Back mode, most database write requests are absorbed by the cache. However, some write requests may skip the cache. The most common reason for skipping the cache is when the request writes a large amount of data that will be read once, such as temp sorts, or it is not expected to be read at all in the foreseeable future, such as backups and archiving.
Another common reason for skipping large writes is Disk Not Busy
,
which typically means that there is no benefit to using Exadata Smart Flash Cache because the hard disks have sufficient capacity
to handle the write requests.
If skipping Exadata Smart Flash Cache for large
writes is causing a performance issue, then it is typically visible in the database
with corresponding long latencies for the direct path write
or
direct path write temp
wait events.
The reasons for rejecting large writes are visible in the Flash Cache User Writes - Large Write Rejections section of the AWR report.
Database Working Set Size
The database working set refers to the subset of most commonly accessed information in the database. In most cases, the database working set is fairly stable. However, if for any reason, the working set does not fit in Exadata Smart Flash Cache, you may see the following symptoms:
- Increased cache misses, indicating that data is not in Exadata Smart Flash Cache. This is visible in
the Flash Cache User Reads section of the AWR report, or the
FC_IO_RQ_R_MISS_SEC
cell metric. - Increased population activity to populate data not in Exadata Smart Flash Cache. This is visible in
the Flash Cache Internal Writes section of the AWR report, or the
FC_IO_[BY|RQ]_W_POPULATE_SEC
cell metrics. - Increased disk writer activity, indicating that dirty cachelines
have to be written to disk, so that the cacheline can be reused to cache other
data. Disk writer activity is visible in the Flash Cache Internal Reads section
of the AWR report, or the
FC_IO_[RQ|BY]_[R|W]_DISK_WRITER_SEC
cell metrics. - Increased first writes, indicating new data is being written to Exadata Smart Flash Cache. A large number of
first writes with few overwrites means new data is being written to Exadata Smart Flash Cache. This is visible in
the Flash Cache User Writes section of the AWR report, or in the
FC_IO_[RQ|BY]_W_FIRST_SEC
cell metrics.
In this case:
- Review the database access patterns for tuning opportunities that reduce the amount of data being accessed.
- Consider increasing the number of available storage servers to deliver more space for Exadata Smart Flash Cache.
- Review your I/O Resource Management (IORM) quotas for Exadata Smart Flash Cache and allocate space where it is most required.
- Consider using Extreme Flash storage servers to eliminate the disk I/Os.
Miscellaneous Issues
There are some cases when the increased cell single block
physical read
latency may not be due to cell performance, but may be
caused by something else along the IO path, such as network contention or contention
for database server CPU resources.
The histograms for cell single block physical read
and
small reads are available in the Single Block Reads and Small Read Histogram -
Detail sections of the AWR report under Exadata Statistics > Performance Summary.
The cell single block physical read
histogram shows latencies
measured by Oracle Database, while the small read histograms show latencies measured
in the storage server.
A histogram with a significant number of occasional long latencies is
said to have a long tail. When the histograms for cell single block physical
read
and small reads have long tails, then this is an indication of
slow read times in the storage server, which would warrant further investigation of
the other I/O performance statistics. See Monitoring Cell Disk I/O.
If the cell single block physical read
histogram has a
long tail that is not present in the small read histograms, then the cause is
generally not in the storage server, but rather something else in the I/O path, such
as bottlenecks in the network or contention for compute node CPU.
Parent topic: Monitoring Exadata Smart Flash Cache