9.2 About System Performance Tuning

9.2.1 About Performance Problems
9.2.2 Monitoring Usage of System Resources
9.2.3 Using the Graphical System Monitor
9.2.4 About OSWatcher Black Box

Performance issues can be caused by any of a system's components, software or hardware, and by their interaction. Many performance diagnostics utilities are available for Oracle Linux, including tools that monitor and analyze resource usage by different hardware components and tracing tools for diagnosing performance issues in multiple processes and their threads.

9.2.1 About Performance Problems

Many performance issues are the result of configuration errors. You can avoid such errors by using a validated configuration that has been pre-tested fore the supported software, hardware, storage, drivers, and networking components. A validated configuration incorporates the best practices for Oracle Linux deployment and has undergone real-world testing of the complete stack. Oracle publishes more than 100 validated configurations, which are freely available for download. You should also refer to the release notes for recommendations on setting kernel parameters.

A typical problem involves out of memory errors and generally poor performance when running Oracle Database. The cause of this problem is likely to be that the system is not configured to use the HugePages feature for the System Global Area (SGA). With HugePages, you can set the page size to between 2MB and 256MB, so reducing the total number of pages that the kernel needs to manage. The memory associated with HugePages cannot be swapped out, which forces the SGA to remain resident in memory.

The following utilities allow you to collect information about system resource usage and errors, and can help you to identify performance problems caused by overloaded disks, network, memory, or CPUs:

dmesg

Displays the contents of the kernel ring buffer, which can contain errors about system resource usage. Provided by the util-linux-ng package.

dstat

Displays statistics about system resource usage. Provided by the dstat package.

free

Displays the amount of free and used memory in the system. Provided by the procps package.

iostat

Reports I/O statistics. Provided by the sysstat package.

iotop

Monitors disk and swap I/O on a per-process basis. Provided by the iotop package.

ip

Reports network interface statistics and errors. Provided by the iproute package.

mpstat

Reports processor-related statistics. Provided by the sysstat package.

sar

Reports information about system activity. Provided by the sysstat package.

ss

Reports network interface statistics. Provided by the iproute package.

top

Provides a dynamic real-time view of the tasks that are running on a system. Provided by the procps package.

uptime

Displays the system load averages for the past 1, 5, and 15 minutes. Provided by the procps package.

vmstat

Reports virtual memory statistics. Provided by the procps package.

Many of these utilities provide overlapping functionality. For more information, see the individual manual page for the utility.

See Section 5.2.3, “Parameters that Control System Performance” for a list of kernel parameters that affect system performance.

9.2.2 Monitoring Usage of System Resources

You need to collect and monitor system resources regularly to provide you with a continuous record of a system. Establish a baseline of acceptable measurements under typical operating conditions. You can then use the baseline as a reference point to make it easier to identify memory shortages, spikes in resource usage, and other problems when they occur. Monitoring system performance also allows you to plan for future growth and to see how configuration changes might affect future performance.

To run a monitoring command every interval seconds in real time and watch its output change, use the watch command. For example, the following command runs the mpstat command once per second:

# watch -n interval mpstat

Alternatively, many of the commands allow you to specify the sampling interval in seconds, for example:

# mpstat interval

If installed, the sar command records statistics every 10 minutes while the system is running and retains this information for every day of the current month. The following command displays all the statistics that sar recorded for day DD of the current month:

# sar -A -f /var/log/sa/saDD

To run sar command as a background process and collect data in a file that you can display later by using the -f option:

# sar -o datafile interval count >/dev/null 2>&1 &

where count is the number of samples to record.

Oracle OSWatcher Black Box (OSWbb) and OSWbb analyzer (OSWbba) are useful tools for collecting and analysing performance statistics. For more information, see Section 9.2.4, “About OSWatcher Black Box”.

9.2.2.1 Monitoring CPU Usage

The uptime, mpstat, sar, dstat, and top utilities allow you to monitor CPU usage. When a system's CPU cores are all occupied executing the code of processes, other processes must wait until a CPU core becomes free or the scheduler switches a CPU core to run their code. If too many processes are queued too often, this can represent a bottleneck in the performance of the system.

The commands mpstat -P ALL and sar -u -P ALL display CPU usage statistics for each CPU core and averaged across all CPU cores.

The %idle value shows the percentage of time that a CPU was not running system code or process code. If the value of %idle is near 0% most of the time on all CPU cores, the system is CPU-bound for the workload that it is running. The percentage of time spent running system code (%systemor %sys) should not usually exceed 30%, especially if %idle is close to 0%.

The system load average represents the number of processes that are running on CPU cores, waiting to run, or waiting for disk I/O activity to complete averaged over a period of time. On a busy system, the load average reported by uptime or sar -q should usually be not greater than two times the number of CPU cores over periods as long as 5 or 15 minutes. If the load average exceeds four times the number of CPU cores for long periods, the system is overloaded.

In addition to load averages (ldavg-*), the sar -q command reports the number of processes currently waiting to run (the run-queue size, runq-sz) and the total number of processes (plist_sz). The value of runq-sz also provides an indication of CPU saturation.

Determine the system's average load under normal loads where users and applications do not experience problems with system responsiveness, and then look for deviations from this benchmark over time. A dramatic rise in the load average can indicate a serious performance problem.

A combination of sustained large load average or large run queue size and low %idle can indicate that the system has insufficient CPU capacity for the workload. When CPU usage is high, use a command such as dstat or top to determine which processes are most likely to be responsible. For example, the following dstat command shows which processes are using CPUs, memory, and block I/O most intensively:

# dstat --top-cpu --top-mem --top-bio

The top command provides a real-time display of CPU activity. By default, top lists the most CPU-intensive processes on the system. In its upper section, top displays general information including the load averages over the past 1, 5 and 15 minutes, the number of running and sleeping processes (tasks), and total CPU and memory usage. In its lower section, top displays a list of processes, including the process ID number (PID), the process owner, CPU usage, memory usage, running time, and the command name. By default, the list is sorted by CPU usage, with the top consumer of CPU listed first. Type f to select which fields top displays, o to change the order of the fields, or O to change the sort field. For example, entering On sorts the list on the percentage memory usage field (%MEM).

9.2.2.2 Monitoring Memory Usage

The sar -r command reports memory utilization statistics, including %memused, which is the percentage of physical memory in use.

sar -B reports memory paging statistics, including pgscank/s, which is the number of memory pages scanned by the kswapd daemon per second, and pgscand/s, which is the number of memory pages scanned directly per second.

sar -W reports swapping statistics, including pswpin/s and pswpout/s, which are the numbers of pages per second swapped in and out per second.

If %memused is near 100% and the scan rate is continuously over 200 pages per second, the system has a memory shortage.

Once a system runs out of real or physical memory and starts using swap space, its performance deteriorates dramatically. If you run out of swap space, your programs or the entire operating system are likely to crash. If free or top indicate that little swap space remains available, this is also an indication you are running low on memory.

The output from the dmesg command might include notification of any problems with physical memory that were detected at boot time.

9.2.2.3 Monitoring Block I/O Usage

The iostat command monitors the loading of block I/O devices by observing the time that the devices are active relative to the average data transfer rates. You can use this information to adjust the system configuration to balance the I/O loading across disks and host adapters.

iostat -x reports extended statistics about block I/O activity at one second intervals, including %util, which is the percentage of CPU time spent handling I/O requests to a device, and avgqu-sz, which is the average queue length of I/O requests that were issued to that device. If %util approaches 100% or avgqu-sz is greater than 1, device saturation is occurring.

You can also use the sar -d command to report on block I/O activity, including values for %util and avgqu-sz.

The iotop utility can help you identify which processes are responsible for excessive disk I/O. iotop has a similar user interface to top. In its upper section, iotop displays the total disk input and output usage in bytes per second. In its lower section, iotop displays I/O information for each process, including disk input output usage in bytes per second, the percentage of time spent swapping in pages from disk or waiting on I/O, and the command name. Use the left and right arrow keys to change the sort field, and press A to toggle the I/O units between bytes per second and total number of bytes, or O to toggle between displaying all processes or only those processes that are performing I/O.

9.2.2.4 Monitoring File System Usage

The sar -v command reports the number of unused cache entries in the directory cache (dentunusd) and the numbers of in-use file handles (file-nr), inode handlers (inode-nr), and pseudo terminals (pty-nr).

iostat -n reports I/O statistics for each NFS file system that is mounted.

9.2.2.5 Monitoring Network Usage

The ip -s link command displays network statistics and errors for all network devices, including the numbers of bytes transmitted (TX) and received (RX). The dropped and overrun fields provide an indicator of network interface saturation.

The ss -s command displays summary statistics for each protocol.

9.2.3 Using the Graphical System Monitor

The GNOME desktop environment includes a graphical system monitor that allows you to display information about the system configuration, running processes, resource usage, and file systems.

To display the System Monitor, use the following command:

# gnome-system-monitor

The Resources tab displays:

  • CPU usage history in graphical form and the current CPU usage as a percentage.

  • Memory and swap usage history in graphical form and the current memory and swap usage.

  • Network usage history in graphical form, the current network usage for reception and transmission, and the total amount of data received and transmitted.

To display the System Monitor Manual, press F1 or select Help > Contents.

9.2.4 About OSWatcher Black Box

Oracle OSWatcher Black Box (OSWbb) collects and archives operating system and network metrics that you can use to diagnose performance issues. OSWbb operates as a set of background processes on the server and gathers data on a regular basis, invoking such Unix utilities as vmstat, mpstat, netstat, iostat, and top.

OSWbb is particularly useful for Oracle RAC (Real Application Clusters) and Oracle Grid Infrastructure configurations. The RAC-DDT (Diagnostic Data Tool) script file includes OSWbb, but does not install it by default.

9.2.4.1 Installing OSWbb

To install OSWbb:

  1. Log on to My Oracle Support (MOS) at http://support.oracle.com.

  2. Download OSWatcher from the link listed by Doc ID 301137.1 at https://support.oracle.com/epmos/faces/DocumentDisplay?id=301137.1.

  3. Copy the file to the directory where you want to install OSWbb, and run the following command:

    # tar xvf oswbbVERS.tar

    VERS represents the version number of OSWatcher, for example 730 for OSWatcher 7.30.

    Extracting the tar file creates a directory named oswbb, which contains all the directories and files that are associated with OSWbb, including the startOSWbb.sh script.

  4. To enable the collection of iostat information for NFS volumes, edit the OSWatcher.sh script in the oswbb directory, and set the value of nfs_collect to 1:

    nfs_collect=1

9.2.4.2 Running OSWbb

To start OSWbb, run the startOSWbb.sh script from the oswbb directory.

# ./startOSWbb.sh [frequency duration]

The optional frequency and duration arguments specifying how often in seconds OSWbb should collect data and the number of hours for which OSWbb should run. The default values are 30 seconds and 48 hours. The following example starts OSWbb recording data at intervals of 60 seconds, and has it record data for 12 hours:

# ./startOSWbb.sh 60 12
...
Testing for discovery of OS Utilities...
VMSTAT found on your system.
IOSTAT found on your system.
MPSTAT found on your system.
IFCONFIG found on your system.
NETSTAT found on your system.
TOP found on your system.

Testing for discovery of OS CPU COUNT
oswbb is looking for the CPU COUNT on your system
CPU COUNT will be used by oswbba to automatically look for cpu problems

CPU COUNT found on your system.
CPU COUNT = 4

Discovery completed.

Starting OSWatcher Black Box v7.3.0  on date and time
With SnapshotInterval = 60
With ArchiveInterval = 12
...
Data is stored in directory: OSWbba_archive

Starting Data Collection...

oswbb heartbeat: date and time
oswbb heartbeat: date and time + 60 seconds
...

OSWbba_archive is the path of the archive directory that contains the OSWbb log files.

To stop OSWbb prematurely, run the stopOSWbb.sh script from the oswbb directory.

# ./stopOSWbb.sh

OSWbb collects data in the following directories under the oswbb/archive directory:

Directory

Description

oswiostat

Contains output from the iostat utility.

oswmeminfo

Contains a listing of the contents of /proc/meminfo.

oswmpstat

Contains output from the mpstat utility.

oswnetstat

Contains output from the netstat utility.

oswprvtnet

If you have enable private network tracing for RAC, contains information about the status of the private networks.

oswps

Contains output from the ps utility.

oswslabinfo

Contains a listing of the contents of /proc/slabinfo.

oswtop

Contains output from the top utility.

oswvmstat

Contains output from the vmstat utility.

OSWbb stores data in hourly archive files named system_name_utility_name_timestamp.dat. Each entry in a file is preceded by a timestamp.

9.2.4.3 Analysing OSWbb Archived Files

From release v4.0.0, you can use the OSWbb analyzer (OSWbba) to provide information on system slowdowns, system hangs and other performance problems, and also to graph data collected from iostat, netstat, and vmstat. OSWbba requires that you have installed Java version 1.4.2 or higher on your system. You can use yum to install Java, or you can download a Java RPM for Linux from http://www.java.com.

Use the following command to run OSWbba from the oswbb directory:

# java -jar oswbba.jar -i OSWbba_archive

OSWbba_archive is the path of the archive directory that contains the OSWbb log files.

You can use OSWbba to display the following types of performance graph:

  • Process run, wait and block queues.

  • CPU time spent running in system, user, and idle mode.

  • Context switches and interrupts.

  • Free memory and available swap.

  • Reads per second, writes per second, service time for I/O requests, and percentage utilization of bandwidth for a specified block device.

You can also use OSWbba to save the analysis to a report file, which reports instances of system slowdown,spikes in run queue length, or memory shortage, describes probable causes, and offers suggestions of how to improve performance.

# java -jar oswbba.jar -i OSWbba_archive -A

For more information about OSWbb and OSWbba, refer to the OSWatcher Black Box User Guide (Article ID 301137.1) and the OSWatcher Black Box Analyzer User Guide (Article ID 461053.1), which are available from My Oracle Support (MOS) at http://support.oracle.com.