6.3.4.4 What to Look For When Monitoring Exadata Smart Flash Log
General Performance
Performance issues related to redo logging typically exhibit high latency
for the log file sync
wait event in the Oracle Database user
foreground processes, with corresponding high latency for log file parallel
write
in the Oracle Database log writer (LGWR) process. Because of the
performance-critical nature of redo log writes, occasional long latencies for
log file parallel write
may cause fluctuations in database
performance, even if the average log file parallel write
wait time
is acceptable.
If any of these are occurring, then it may be indicative of an issue with Exadata Smart Flash Log performance.
Redo Write Histograms
The log file parallel write
wait event indicates the
amount of time that the database waits on a redo log write. The log file
parallel write
histogram shows the number of times the redo write
completed within a specified time range. Similarly, the redo log write
completions
statistic indicates the amount of time that the storage
server spends processing redo write requests, and the redo log write
completions
histogram shows the number of times the redo write request
was completed within a specified time range. Both histograms are shown in the Redo
Write Histogram section of the AWR report.
A histogram with a significant number of occasional long latencies is said to have a long tail. When both of the histograms in the Redo Write Histogram section of the AWR report have long tails, then this is an indication of slow write times on the storage server, which would warrant further investigation of the other I/O performance statistics. See Monitoring Cell Disk I/O.
If the log file parallel write
histogram has a long
tail that is not present in the redo log write completions
histogram, then the cause is generally not in the storage server, but rather
something else in the I/O path, such as bottlenecks in the network or contention for
compute node CPU.
Skipping
Increased redo write latencies can also occur when the Exadata Smart Flash Log is skipped, and the redo write goes only to disk. Both the AWR report and the storage server metrics show the number of redo log writes that skipped Exadata Smart Flash Log. Skipping may occur when Exadata Smart Flash Log contains too much data that has not yet been written to disk.
There are a few factors that can cause redo writes to skip Exadata Smart Flash Log:
-
Flash disks with high write latencies.
This can be observed in various IO Latency tables located in the Exadata Resource Statistics section of the AWR report, and in the Exadata
CELLDISK
metrics. This can also be identified by checking theFL_FLASH_ONLY_OUTLIERS
metric. If the metric value is high, this indicates a flash disk performance issue. -
Hard disks with high latencies or high utilization.
Prior to Oracle Exadata System Software release 20.1, redo log writes are written to both Exadata Smart Flash Log and hard disk. If the hard disks experience high latencies or high utilization, redo log write performance can be impacted.
This can be observed in various IO Latency tables located in the Exadata Resource Statistics section of the AWR report, and in the Exadata
CELLDISK
metrics. This can also be identified by checking the Outliers columns in the Flash Log section of the AWR report, or theFL_PREVENTED_OUTLIERS
storage server metric. A large number of prevented outliers may indicate that the hard disk writes are taking a long time.In this case, although Exadata Smart Flash Log prevents outliers, overall throughput may be limited due to the queue of redo log data that must be written to disk.
Oracle Exadata System Software release 20.1 adds a further optimization, known as Smart Flash Log Write-Back, that uses Exadata Smart Flash Cache in Write-Back mode instead of disk storage, thereby eliminating the hard disks as a potential performance bottleneck. Depending on the system workload, this feature can improve overall log write throughput by up to 250%.
Parent topic: Monitoring Exadata Smart Flash Log