1 Monitoring the System and Optimizing Performance
Performance issues can be caused by several system components including software or hardware, and any related interactions. Many performance diagnostics utilities are available in Oracle Linux and include tools that monitor and analyze the resource usage of different hardware components, and also tracing tools for diagnosing performance issues in several processes and related threads.
Many performance issues are the result of configuration errors. You can avoid these errors by using a validated configuration that has been pretested for the enabled software, hardware, storage, drivers, and networking components. A validated configuration incorporates best practices for an Oracle Linux deployment and has undergone real-world testing of the complete stack. Oracle publishes many validated configurations, which are available for download. See the release notes for the release that you're running for extra recommendations on kernel parameter settings.
System Performance and Monitoring Utilities
Different Oracle Linux utilities are available that enable you to collect information about system resource usage and errors, and help you to identify performance problems that are caused by overloaded disks, network, memory, or CPUs.
The following packages of the system performance and monitoring utilities are installed by default. If they're not already available, you can install the packages as required.
util-linux
providesdmesg
that displays the contents of the kernel ring buffer, which can contain errors about system resource usage.procps-ng
provides these utilities:free
: Displays the amount of free and used memory in the system.top
: Provides a dynamic real-time view of the tasks that are running on a system.uptime
: Displays the system load averages for the past 1, 5, and 15 minutes.vmstat
: Reports virtual memory statistics.
iproute
provides these utilities:ip
: Reports network interface statistics and errors.ss
: Reports network interface statistics.
Tip:
To verify if a utility is available in the system, check if the utility's package is
installed. For example, for the dmesg
utility, you can type:
dnf info util-linux
Installed Packages
Name : util-linux
Version : version-number
...
You can install the following packages to take advantage of extra utilities.
sysstat
provides these utilities:iostat
: Reports I/O statistics.mpstat
: Reports processor-related statistics.sar
: Reports information about system activity.
iotop
providesiotop
that monitors disk and swap I/O on a per-process basis.nfs-utils
providesnfsiostat
that reports I/O statistics for NFS mounts
Many of these utilities provide overlapping functionality. For more information, see the individual manual page for the utility.
For a hands-on tutorial and associated video content on many of these utilities, see Monitor system resources on Oracle Linux.
Monitoring the Usage of System Resources
You can monitor system performance by collecting information about system resources. For better assessment, first, establish a baseline of acceptable measurements under typical operating conditions. You can then use that baseline as a reference point so that you can identify more easily memory shortages, spikes in resource usage, and other problems when they occur. You can also use performance monitoring to plan for future growth and decide how configuration changes might affect future performance.
For more information about monitoring the use of resources in the system, see also Working With Performance Co-Pilot.
Monitoring Usage in Real Time
To run a monitoring command for a set number of seconds in real time and watch the output change, use the watch command. For example, you can run the mpstat command every second by using the following command:
sudo watch -n 1 mpstat
That command generates a single-line output that changes information every second, for example:
hh:mm:ss CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
hh:mm:ss all 1.44 0.02 0.80 0.01 0.07 0.05 0.06 0.00 0.00 97.56
Alternatively, many of the commands enable you to specify the sampling interval in seconds, for example:
sudo mpstat 1
The command displays the same information as the previous command, except that the information is in a running list where a new line of information is generated every second, for example:
hh:mm:ss CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
hh:mm:ss all 0.00 0.00 0.00 0.00 0.50 0.50 0.00 0.00 0.00 99.01
hh:mm:ss all 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
...
To stop real time monitoring, press Ctrl-c.
Monitoring Usage Through Logs
The sar utility records statistics every 10 minutes while the
system is running and retains that information for every day of the current month. The
information is stored in /var/log/sa/saDD
.
To display the contents of the sar
log of a specific day of the current
month, type:
sudo sar -A -f /var/log/sa/saDD
You can also use the sar
command to create a customized log that contains
a record of specific information that you want to monitor. Use the following syntax:
sudo sar -o datafile seconds count >/dev/null 2>&1 &
In the previous command, datafile is the full path to the customized
log where you want to store the information, while count represents the
number of samples to record. With this command, the sar
process runs in the
background and collects the data.
To display the results of this monitoring, you would type:
sudo sar -A -f datafile
Monitoring CPU Usage
When every CPU core is occupied with executing system processes, other processes must wait until a CPU core becomes free or when the scheduler switches a CPU core to run its code. Too many processes that are queued too often can represent a system performance bottleneck.
The following are some commands for monitoring CPU usage. For the different options and arguments that you can use with these commands, see their respective manual pages.
- uptime
- mpstat
- sar
- top
The following examples show how you can use these commands to obtain statistics on the system's memory usage:
-
Review CPU usage statistics for each CPU core and averaged the data across all the CPU cores.
-
mpstat -P ALL
05:10:01 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 05:10:01 PM all 0.76 0.02 0.70 0.02 0.08 0.05 0.06 0.00 0.00 98.32 05:10:01 PM 0 0.75 0.01 0.68 0.00 0.09 0.03 0.07 0.00 0.00 98.37 05:10:01 PM 1 0.76 0.03 0.72 0.03 0.06 0.06 0.06 0.00 0.00 98.27
-
sar -u -P ALL
03:00:01 PM CPU %user %nice %system %iowait %steal %idle 03:10:01 PM all 8.30 0.00 2.20 0.22 0.10 89.18 03:10:01 PM 0 8.22 0.00 2.64 0.31 0.09 88.74 03:10:01 PM 1 8.39 0.00 1.77 0.13 0.10 89.61 ...
The output of these commands include data under the
%idle
heading, which show the percentage of time that a CPU isn't running system or process code. If the percentage is often 0% on all CPU cores, then the system is CPU-bound for the workload that's running.We recommend that the percentage of time to run system code, which is reported under the
%system
or%sys
heading, not exceed 30%, especially if%idle
is close to 0%. -
-
Review information about system load average.
-
uptime
21:25:34 up 6:28, 1 user, load average: 0.02, 0.10, 0.04
-
sar -q
14:57:55 LINUX RESTART (2 CPU) 03:00:01 PM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 blocked 03:10:01 PM 0 214 0.19 0.30 0.22 0 03:20:01 PM 0 212 0.00 0.05 0.10 0 ... Average: runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 blocked Average: 1 212 0.12 0.08 0.05 0
The system load average consists of the combination of the number of processes that are running on CPU cores, waiting to run, and waiting for disk I/O activity to complete, which are then averaged over time.
The
uptime
command shows the information in a single line, while thesar
syntax displays the information in columns underldavg-*
headings. Also, thesar
syntax also shows the number of processes waiting to run as the total number of processes, which are reported under therunq-sz
andplist_sz
headings.On a busy system, we recommend that the load average typically not be greater than two times the number of CPU cores over a period of 5 or 15 minutes. If the load average exceeds four times the number of CPU cores for long periods, then the system is overloaded.
For a better assessment of the system load, find the system's average load under normal loads where users and applications don't experience problems with system responsiveness. Then, look for deviations from this benchmark over time. A dramatic rise in the load average can indicate a serious performance problem.
-
-
Review a real-time listing of CPU activity.
top
top - 22:13:34 up 7:16, 1 user, load average: 0.00, 0.02, 0.01 Tasks: 149 total, 1 running, 148 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.2 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.2 hi, 0.0 si, 0.0 st MiB Mem : 14705.5 total, 11738.9 free, 753.2 used, 2213.4 buff/cache MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 13588.9 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 20 0 118844 18244 11116 S 0.0 0.1 5:18.78 systemd 34781 root 20 0 11384 7100 1700 S 1.0 0.0 0:00.03 pidstat 1481 root 20 0 202628 38328 36792 S 0.3 0.3 0:02.94 sssd_nss ...
By default, top lists the most CPU-intensive processes on the system. The output's upper section displays general information including the load averages over the past 1, 5, and 15 minutes, the number of running and sleeping processes or tasks, and total CPU, and memory usage.
The second table displays a list of processes, including the process ID number (PID), the process owner, CPU usage, memory usage, running time, and the command name. By default, the list is sorted by CPU usage, with the top consumer of CPU listed first.
To stop the top process, press Ctrl-c.
All the commands can be used together to provide you a picture of the system's CPU usage.
For example, sustained large load average or large run queue size and low
%idle
percentages can indicate that the system has insufficient CPU
capacity for the workload. When CPU usage is high, the top
command can
identify which processes are likely responsible.
Monitoring Memory Usage
The following are useful utilities to monitor memory usage:
Monitoring Memory Usage With the sar Command
To monitor memory usage, use the sar command with available options depending on the information you want to obtain. For a list of options, type sar --help.
sar -r
: Reports memory usage statistics, such as free (kbmemfree
), available (kbavail
), and used (kbmemused
) memory. The report also include%memused
, which is the percentage of physical memory in use.sar -B
: Reports memory paging statistics, such as page in (pgpgin/s
) and page out (pgpgout/s
), page faults and major faults, and so on. The report also includespgscank/s
, which is the number of memory pages scanned by thekswapd
daemon each second, andpgscand/s
, which is the number of memory pages scanned directly each second.sar -W
: Reports swapping statistics, includingpswpin/s
andpswpout/s
, which are the numbers of pages each second swapped in and out each second.
If %memused
is near 100% and the scan rate is continuously over 200 pages
each second, the system has a memory shortage.
When a system runs out of real or physical memory and starts using swap space, system performance deteriorates dramatically. If you run out of swap space then some programs or even the entire OS are likely to malfunction. If the free or top commands indicate that little swap space remains available, then the system is running low on memory.
The output from the dmesg command might include notification of any problems with physical memory that were detected at boot time.
Using the Adaptive Memory Management Daemon
To manage memory usage, you can use the Adaptive Memory Management daemon. This daemon is a user space service that monitors free memory on an Oracle Linux system and predicts memory fragmentation and usage. It can also automatically reclaim memory if the system memory becomes too fragmented or is at risk of being filled to capacity.
If the system memory becomes highly fragmented, adaptivemmd
triggers the
kernel to compact memory so that fragmented space can be reclaimed before it's reallocated.
If the system is likely to exhaust the available memory then watermarks are adjusted, and
this can trigger the kernel to free up new pages in memory. Adaptive Memory Management is
available in Unbreakable Enterprise Kernel Release 6 and later.
To use this utility, do the following:
- Install the
adaptivemm
package.sudo dnf install -y adaptivemm
- Start the daemon service.
sudo systemctl enable --now adaptivemmd
To see the different options that you can use with the adaptivemmd command, type:
sudo adaptivemmd -h
You can change the configuration options in /etc/sysconfig/adaptivemmd
.
For more information see the adaptivemmd(8)
manual page.
Monitoring Block I/O Usage
The iostat command monitors the loading of block I/O devices by observing the time that the devices are active relative to the average data transfer rates. You can use this information to adjust the system configuration to balance the I/O loading across disks and host adapters. The following is a sample of the command output:
iostat
avg-cpu: %user %nice %system %iowait %steal %idle
0.69 0.05 0.77 0.00 0.03 98.46
Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 1.05 2.32 25.19 611410 6640659
dm-0 0.62 2.07 20.22 545716 5329660
dm-1 0.70 0.02 4.97 4632 1308788
iostat -x reports extended statistics about
block I/O activity at one second intervals, including
%util
, which is the percentage of CPU time
spent handling I/O requests to a device, and
avgqu-sz
, which is the average queue length
of I/O requests that were issued to that device. If
%util
approaches 100% or
avgqu-sz
is greater than 1, device saturation
is occurring.
You can also use the sar -d command to report
on block I/O activity, including values for
%util
and avgqu-sz
.
The iotop utility can help you identify which processes are responsible for excessive disk I/O. iotop has a similar user interface to top. In its upper section, iotop displays the total disk input and output usage in bytes per second. In its lower section, iotop displays I/O information for each process, including disk input output usage in bytes per second, the percentage of time spent swapping in pages from disk or waiting on I/O, and the command name. The following is a sample command output:
sudo iotop
Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE> COMMAND
1 be/4 root 0.00 B/s 0.00 B/s systemd --switched-root --system --deserialize 16
2 be/4 root 0.00 B/s 0.00 B/s [kthreadd]
...
While you review the output, use the arrow keys to change the sort field, and press
A
to switch the I/O units between bytes each second and total number of
bytes, or O
to switch between displaying all processes or only those
processes that are performing I/O.
Monitoring File System Usage
The sar -v command reports the number of unused cache entries in the
directory cache (dentunusd
) and the numbers of in-use file handles
(file-nr
), inode
handlers (inode-nr
), and
pseudo terminals (pty-nr
).
sar -v
12:00:01 AM dentunusd file-nr inode-nr pty-nr
12:10:33 AM 80101 2944 73074 0
12:20:33 AM 79788 2944 72654 0
nfsiostat reports I/O statistics for each NFS file system that's
mounted. If this command isn't available install the nfs-utils
package.
Monitoring Network Usage
The ip -s link command displays network
statistics and errors for all network devices, including the
numbers of bytes transmitted (TX
) and
received (RX
). The dropped
and overrun
fields provide an indicator of
network interface saturation, for example:
ip -s link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
RX: bytes packets errors dropped overrun mcast
240 4 0 0 0 0
TX: bytes packets errors dropped carrier collsns
240 4 0 0 0 0
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 08:00:27:60:95:d5 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped overrun mcast
258187485 671730 0 0 0 17296
TX: bytes packets errors dropped carrier collsns
13227598 130827 0 0 0 0
The ss -s command displays summary statistics for each protocol, for example:
ss -s
Total: 193
TCP: 9 (estab 2, closed 0, orphaned 0, timewait 0)
Using the Graphical System Monitor
The GNOME desktop environment includes a graphical system monitor that you can use to display information about the system configuration, running processes, resource usage, and file systems.
To display the System Monitor, use the following command:
gnome-system-monitor
Selecting the Resources tab displays the following information:
-
CPU usage history in graphical form and the current CPU usage as a percentage.
-
Memory and swap usage history in graphical form and the current memory and swap usage.
-
Network usage history in graphical form, the current network usage for reception and transmission, and the total amount of data received and transmitted.
To display the System Monitor Manual, press
F1
or select Help, then select Contents.