1 Monitoring the System and Optimizing Performance

Performance issues can be caused by several system components including software or hardware, and any related interactions. Many performance diagnostics utilities are available in Oracle Linux and include tools that monitor and analyze the resource usage of different hardware components, and also tracing tools for diagnosing performance issues in several processes and related threads.

Many performance issues are the result of configuration errors. You can avoid these errors by using a validated configuration that has been pretested for the enabled software, hardware, storage, drivers, and networking components. A validated configuration incorporates best practices for an Oracle Linux deployment and has undergone real-world testing of the complete stack. Oracle publishes many validated configurations, which are available for download. See the release notes for the release that you're running for extra recommendations on kernel parameter settings.

System Performance and Monitoring Utilities

Different Oracle Linux utilities are available that enable you to collect information about system resource usage and errors, and help you to identify performance problems that are caused by overloaded disks, network, memory, or CPUs.

The following packages of the system performance and monitoring utilities are installed by default. If they're not already available, you can install the packages as required.

  • util-linux provides dmesg that displays the contents of the kernel ring buffer, which can contain errors about system resource usage.
  • procps-ng provides these utilities:
    • free: Displays the amount of free and used memory in the system.
    • top: Provides a dynamic real-time view of the tasks that are running on a system.
    • uptime: Displays the system load averages for the past 1, 5, and 15 minutes.
    • vmstat: Reports virtual memory statistics.
  • iproute provides these utilities:
    • ip: Reports network interface statistics and errors.
    • ss: Reports network interface statistics.

Tip:

To verify if a utility is available in the system, check if the utility's package is installed. For example, for the dmesg utility, you can type:

dnf info util-linux
Installed Packages
Name         : util-linux
Version      : version-number
...

You can install the following packages to take advantage of extra utilities.

  • sysstat provides these utilities:
    • iostat: Reports I/O statistics.
    • mpstat: Reports processor-related statistics.
    • sar: Reports information about system activity.
  • iotop provides iotop that monitors disk and swap I/O on a per-process basis.
  • nfs-utils provides nfsiostat that reports I/O statistics for NFS mounts

Many of these utilities provide overlapping functionality. For more information, see the individual manual page for the utility.

For a hands-on tutorial and associated video content on many of these utilities, see Monitor system resources on Oracle Linux.

Monitoring the Usage of System Resources

You can monitor system performance by collecting information about system resources. For better assessment, first, establish a baseline of acceptable measurements under typical operating conditions. You can then use that baseline as a reference point so that you can identify more easily memory shortages, spikes in resource usage, and other problems when they occur. You can also use performance monitoring to plan for future growth and decide how configuration changes might affect future performance.

For more information about monitoring the use of resources in the system, see also Working With OSWatcher Black Box and Working With Performance Co-Pilot.

Monitoring Usage in Real Time

To run a monitoring command for a set number of seconds in real time and watch the output change, use the watch command. For example, you can run the mpstat command every second by using the following command:

sudo watch -n 1 mpstat

That command generates a single-line output that changes information every second, for example:

hh:mm:ss  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
hh:mm:ss  all    1.44    0.02    0.80    0.01    0.07    0.05    0.06    0.00    0.00   97.56

Alternatively, many of the commands enable you to specify the sampling interval in seconds, for example:

sudo mpstat 1

The command displays the same information as the previous command, except that the information is in a running list where a new line of information is generated every second, for example:

hh:mm:ss  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
hh:mm:ss  all    0.00    0.00    0.00    0.00    0.50    0.50    0.00    0.00    0.00   99.01
hh:mm:ss  all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
...

To stop real time monitoring, press Ctrl-c.

Monitoring Usage Through Logs

The sar utility records statistics every 10 minutes while the system is running and retains that information for every day of the current month. The information is stored in /var/log/sa/saDD.

To display the contents of the sar log of a specific day of the current month, type:

sudo sar -A -f /var/log/sa/saDD               

You can also use the sar command to create a customized log that contains a record of specific information that you want to monitor. Use the following syntax:

sudo sar -o datafile seconds count >/dev/null 2>&1 &

In the previous command, datafile is the full path to the customized log where you want to store the information, while count represents the number of samples to record. With this command, the sar process runs in the background and collects the data.

To display the results of this monitoring, you would type:

sudo sar -A -f datafile

Monitoring CPU Usage

When every CPU core is occupied with executing system processes, other processes must wait until a CPU core becomes free or when the scheduler switches a CPU core to run its code. Too many processes that are queued too often can represent a system performance bottleneck.

The following are some commands for monitoring CPU usage. For the different options and arguments that you can use with these commands, see their respective manual pages.

  • uptime
  • mpstat
  • sar
  • top

The following examples show how you can use these commands to obtain statistics on the system's memory usage:

  • Review CPU usage statistics for each CPU core and averaged the data across all the CPU cores.

    • mpstat -P ALL
      05:10:01 PM  CPU    %usr   %nice    %sys   %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
      05:10:01 PM  all    0.76    0.02    0.70      0.02    0.08    0.05    0.06    0.00    0.00   98.32
      05:10:01 PM    0    0.75    0.01    0.68      0.00    0.09    0.03    0.07    0.00    0.00   98.37
      05:10:01 PM    1    0.76    0.03    0.72      0.03    0.06    0.06    0.06    0.00    0.00   98.27
      
    • sar -u -P ALL
      03:00:01 PM  CPU  %user   %nice   %system   %iowait  %steal   %idle
      03:10:01 PM  all  8.30    0.00    2.20      0.22     0.10     89.18
      03:10:01 PM    0  8.22    0.00    2.64      0.31     0.09     88.74
      03:10:01 PM    1  8.39    0.00    1.77      0.13     0.10     89.61
      ...

    The output of these commands include data under the %idle heading, which show the percentage of time that a CPU isn't running system or process code. If the percentage is often 0% on all CPU cores, then the system is CPU-bound for the workload that's running.

    We recommend that the percentage of time to run system code, which is reported under the %system or %sys heading, not exceed 30%, especially if %idle is close to 0%.

  • Review information about system load average.

    • uptime
      21:25:34 up  6:28,  1 user,  load average: 0.02, 0.10, 0.04
    • sar -q
      14:57:55     LINUX RESTART	(2 CPU)
      
      03:00:01 PM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
      03:10:01 PM         0       214      0.19      0.30      0.22         0
      03:20:01 PM         0       212      0.00      0.05      0.10         0
      ...
      Average:      runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
      Average:            1       212      0.12      0.08      0.05         0

    The system load average consists of the combination of the number of processes that are running on CPU cores, waiting to run, and waiting for disk I/O activity to complete, which are then averaged over time.

    The uptime command shows the information in a single line, while the sar syntax displays the information in columns under ldavg-* headings. Also, the sar syntax also shows the number of processes waiting to run as the total number of processes, which are reported under the runq-sz and plist_sz headings.

    On a busy system, we recommend that the load average typically not be greater than two times the number of CPU cores over a period of 5 or 15 minutes. If the load average exceeds four times the number of CPU cores for long periods, then the system is overloaded.

    For a better assessment of the system load, find the system's average load under normal loads where users and applications don't experience problems with system responsiveness. Then, look for deviations from this benchmark over time. A dramatic rise in the load average can indicate a serious performance problem.

  • Review a real-time listing of CPU activity.

    top
    top - 22:13:34 up  7:16,  1 user,  load average: 0.00, 0.02, 0.01
    Tasks: 149 total,   1 running, 148 sleeping,   0 stopped,   0 zombie
    %Cpu(s):  0.2 us,  0.0 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.2 hi,  0.0 si,  0.0 st
    MiB Mem :  14705.5 total,  11738.9 free,    753.2 used,   2213.4 buff/cache
    MiB Swap:   4096.0 total,   4096.0 free,      0.0 used.  13588.9 avail Mem 
    
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                     
      34781 root      20   0   11384   7100   1700 S   1.0   0.0   0:00.03 pidstat                     
       2014 root      20   0   26216   3568   3056 S   0.7   0.0   0:04.78 OSWatcher                   
       1481 root      20   0  202628  38328  36792 S   0.3   0.3   0:02.94 sssd_nss
       ...

    By default, top lists the most CPU-intensive processes on the system. The output's upper section displays general information including the load averages over the past 1, 5, and 15 minutes, the number of running and sleeping processes or tasks, and total CPU, and memory usage.

    The second table displays a list of processes, including the process ID number (PID), the process owner, CPU usage, memory usage, running time, and the command name. By default, the list is sorted by CPU usage, with the top consumer of CPU listed first.

    To stop the top process, press Ctrl-c.

All the commands can be used together to provide you a picture of the system's CPU usage. For example, sustained large load average or large run queue size and low %idle percentages can indicate that the system has insufficient CPU capacity for the workload. When CPU usage is high, the top command can identify which processes are likely responsible.

Monitoring Memory Usage

The following are useful utilities to monitor memory usage:

Monitoring Memory Usage With the sar Command

To monitor memory usage, use the sar command with available options depending on the information you want to obtain. For a list of options, type sar --help.

  • sar -r: Reports memory usage statistics, such as free (kbmemfree), available (kbavail), and used (kbmemused) memory. The report also include %memused, which is the percentage of physical memory in use.
  • sar -B: Reports memory paging statistics, such as page in (pgpgin/s) and page out (pgpgout/s), page faults and major faults, and so on. The report also includes pgscank/s, which is the number of memory pages scanned by the kswapd daemon each second, and pgscand/s, which is the number of memory pages scanned directly each second.
  • sar -W: Reports swapping statistics, including pswpin/s and pswpout/s, which are the numbers of pages each second swapped in and out each second.

If %memused is near 100% and the scan rate is continuously over 200 pages each second, the system has a memory shortage.

When a system runs out of real or physical memory and starts using swap space, system performance deteriorates dramatically. If you run out of swap space then some programs or even the entire OS are likely to malfunction. If the free or top commands indicate that little swap space remains available, then the system is running low on memory.

The output from the dmesg command might include notification of any problems with physical memory that were detected at boot time.

Using the Adaptive Memory Management Daemon

To manage memory usage, you can use the Adaptive Memory Management daemon, which is available beginning with UEK R6. This daemon is a user space service that monitors free memory on an Oracle Linux system and predicts memory fragmentation and usage. It can also automatically reclaim memory if the system memory becomes too fragmented or is at risk of being filled to capacity.

If the system memory becomes highly fragmented, adaptivemmd triggers the kernel to compact memory so that fragmented space can be reclaimed before it's reallocated. If the system is likely to exhaust the available memory then watermarks are adjusted, and this can trigger the kernel to free up new pages in memory. Adaptive Memory Management is available in Unbreakable Enterprise Kernel Release 6 and later.

To use this utility, do the following:

  1. Install the adaptivemm package.
    sudo dnf install -y adaptivemm
  2. Start the daemon service.
    sudo systemctl enable --now adaptivemmd

To see the different options that you can use with the adaptivemmd command, type:

sudo adaptivemmd -h

You can change the configuration options in /etc/sysconfig/adaptivemmd.

For more information see the adaptivemmd(8) manual page.

Monitoring Block I/O Usage

The iostat command monitors the loading of block I/O devices by observing the time that the devices are active relative to the average data transfer rates. You can use this information to adjust the system configuration to balance the I/O loading across disks and host adapters. The following is a sample of the command output:

iostat
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.69    0.05    0.77    0.00    0.03   98.46

Device             tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               1.05         2.32        25.19     611410    6640659
dm-0              0.62         2.07        20.22     545716    5329660
dm-1              0.70         0.02         4.97       4632    1308788

iostat -x reports extended statistics about block I/O activity at one second intervals, including %util, which is the percentage of CPU time spent handling I/O requests to a device, and avgqu-sz, which is the average queue length of I/O requests that were issued to that device. If %util approaches 100% or avgqu-sz is greater than 1, device saturation is occurring.

You can also use the sar -d command to report on block I/O activity, including values for %util and avgqu-sz.

The iotop utility can help you identify which processes are responsible for excessive disk I/O. iotop has a similar user interface to top. In its upper section, iotop displays the total disk input and output usage in bytes per second. In its lower section, iotop displays I/O information for each process, including disk input output usage in bytes per second, the percentage of time spent swapping in pages from disk or waiting on I/O, and the command name. The following is a sample command output:

sudo iotop
Total DISK READ :	0.00 B/s | Total DISK WRITE :       0.00 B/s
Actual DISK READ:	0.00 B/s | Actual DISK WRITE:       0.00 B/s
    TID  PRIO  USER     DISK READ DISK WRITE>    COMMAND                                            
      1 be/4 root        0.00 B/s    0.00 B/s systemd --switched-root --system --deserialize 16
      2 be/4 root        0.00 B/s    0.00 B/s [kthreadd]
...

While you review the output, use the arrow keys to change the sort field, and press A to switch the I/O units between bytes each second and total number of bytes, or O to switch between displaying all processes or only those processes that are performing I/O.

Monitoring File System Usage

The sar -v command reports the number of unused cache entries in the directory cache (dentunusd) and the numbers of in-use file handles (file-nr), inode handlers (inode-nr), and pseudo terminals (pty-nr).

sar -v
12:00:01 AM dentunusd   file-nr  inode-nr    pty-nr
12:10:33 AM     80101      2944     73074         0
12:20:33 AM     79788      2944     72654         0

nfsiostat reports I/O statistics for each NFS file system that's mounted. If this command isn't available install the nfs-utils package.

Monitoring Network Usage

The ip -s link command displays network statistics and errors for all network devices, including the numbers of bytes transmitted (TX) and received (RX). The dropped and overrun fields provide an indicator of network interface saturation, for example:

ip -s link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    RX: bytes  packets  errors  dropped overrun mcast   
    240        4        0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    240        4        0       0       0       0       
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 08:00:27:60:95:d5 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    258187485  671730   0       0       0       17296   
    TX: bytes  packets  errors  dropped carrier collsns 
    13227598   130827   0       0       0       0

The ss -s command displays summary statistics for each protocol, for example:

ss -s
Total: 193
TCP:   9 (estab 2, closed 0, orphaned 0, timewait 0)

Using the Graphical System Monitor

The GNOME desktop environment includes a graphical system monitor that you can use to display information about the system configuration, running processes, resource usage, and file systems.

To display the System Monitor, use the following command:

gnome-system-monitor

Selecting the Resources tab displays the following information:

  • CPU usage history in graphical form and the current CPU usage as a percentage.

  • Memory and swap usage history in graphical form and the current memory and swap usage.

  • Network usage history in graphical form, the current network usage for reception and transmission, and the total amount of data received and transmitted.

To display the System Monitor Manual, press F1 or select Help, then select Contents.