As an optional post-installation step, Oracle recommends that you also install and configure diagnostics tools on all Oracle VM Servers. These tools can be used to help debug and diagnose issues such as system crashes, hanging, unscheduled reboots, and OCFS2 cluster errors. The output from these tools can be used by Oracle Support and can significantly improve resolution and response times.
Obtaining a system memory dump, vmcore, can be very useful when attempting to diagnose and resolve the root cause of an issue. To get a useful vmcore dump, a kdump service configuration is required. See Section 1.4.2, “Manually Configuring kdump for Oracle VM Server” below for more information on this.
In addition, you can install netconsole, a utility allowing system console messages to be redirected across the network to another server. See the Oracle Support Document, How to Configure "netconsole" for Oracle VM Server 3.0, for information on how to install netconsole.
Additional information on using diagnostic tools is provided in the Oracle Linux documentation. See the chapter titled Support Diagnostic Tools in the Oracle Linux Administrator's Solutions Guide.
http://docs.oracle.com/cd/E37670_01/E37355/html/ol_diag.html
OSWatcher (oswbb) is a collection of shell scripts that collect and archive operating system and network metrics to diagnose performance issues with Oracle VM Server. OSWatcher operates as a set of background processes to gather data with standard UNIX utilities such as vmstat, netstat and iostat.
By default, OSWatcher is installed on Oracle VM Server and is enabled to run at boot. The following table describes the OSWatcher program and main configuration file:
Name | Description |
---|---|
| The main OSWatcher program. If required, you can configure certain parameters for statistics collection. However, you should do so only if Oracle Support advises you to change the default configuration. |
| This file defines the directory where OSWatcher log files are saved, the interval between statistics collection, and the maximum amount of time to retain archived statistics. Important It is not possible to specify a limit to the data that the OSWatcher utility collects. For this reason, you should be careful when modifying the default configuration so that the OSWatcher utility does not use all available space on the system disk. |
To start, stop, and check the status of OSWatcher, use the following command:
# service oswatcher {start|stop|status|restart|reload|condrestart}
For detailed information on the data that OSWatcher collects and how
to analyze the output, as well as for instructions on sending the
data to Oracle Support, see the OSWatcher User
Guide in the following directory on Oracle VM Server:
/usr/share/doc/oswatcher-
x
.x
.x
/
While Oracle VM Server uses the robust UEK4 kernel which is stable and
fault-tolerant and should rarely encounter errors that crash the
entire system, it is still possible that a system-wide error results
in a kernel crash. Information about the actual state of the system
at the time of a kernel crash is critical to accurately debug issues
and to resolve them. The kdump service is used to capture the memory
dump from dom0 and store it on the filesystem. The service does not
dump any system memory used by guest virtual machines, so the memory
dump is specific to dom0 and the Xen hypervisor itself. The memory
dump file that is generated by kdump is referred to as the
vmcore
file.
A description of the actions required to manually configure Oracle VM Server so that the kdump service is properly enabled and running is provided here, so that you are able to set up and enable this service after an installation. The Oracle VM Server installer provides an option to enable kdump at installation where many of these steps are performed automatically. See Kdump Setting in the Oracle VM Installation and Upgrade Guide for more information on this.
By default, the required packages to enable the kdump service are included within the Oracle VM Server installation, but it is good practice to check that these are installed before continuing with any configuration work. You can do this by running the following command:
# rpm -qa | grep kexec-tools
If the kexec-tools
package is not installed,
you must install it manually.
Oracle VM Server makes use of GRUB2 to handle the boot process. In this
step, you must configure GRUB2 to pass the
crashkernel
parameter to the Xen kernel at
boot. This can be done by editing the
/etc/default/grub
file and modifying the
GRUB_CMDLINE_XEN
variable by appending the
appropriate crashkernel
parameter.
The crashkernel
parameter specifies the amount
of space used in memory to load the crash kernel that is used to
generate the dump file, and also specifies the offset which is the
beginning of the crash kernel region in memory. The minimum amount
of RAM that may be specified for a crash kernel is 512 MB and this
should be offset by 64 MB. This would result in a configuration
that looks similar to the following:
GRUB_CMDLINE_XEN="dom0_mem=max:6144M allowsuperpage dom0_vcpus_pin \ dom0_max_vcpus=20 crashkernel=512M@64M"
This setting is sufficient for the vast majority of systems, however on systems that make use of a significant number of large drivers, the crash kernel may need to be allocated more space in memory. If you force a dump and it fails to generate a core file, you may need to increase the amount of memory allocated to the crash kernel.
While UEK4 supports the crashkernel=auto
option, the Xen hypervisor does not. You must specify values for
the RAM reservation and offset used for the crash kernel or the
kdump service is unable to run.
When you have finished modifying
/etc/default/grub
, you must rebuild the
system GRUB2 configuration that is used at boot time. This is done
by running:
# grub2-mkconfig -o /boot/grub2/grub.cfg
Kdump is able to store vmcore files in a variety of locations,
including network accessible filesystems. By default, vmcore files
are stored in /var/crash/
, but this may not
be appropriate depending on your disk partitioning and available
space. The filesystem where the vmcore files are stored must have
enough space to match the amount of memory available to Oracle VM Server
for each dump.
Since the installation of Oracle VM Server only uses as much disk space as is required, a 'spare' partition is frequently available on a new installation. This partition is left available for use for hosting a local repository or for alternate use such as for hosting vmcore files generated by kdump. If you opt to use it for this purpose, you must first correctly identify and take note of the UUID of the partition and then format it with a usable filesystem.
The following steps serve as an illustration of how you might prepare the local spare partition.
Identify the partition that the installer left 'spare' after the installation. This is usually listed under
/dev/mapper
with a filename that starts withOVM_SYS_REPO_PART
. If you can identify this device, you can format it with an ext4 filesystem:# mkfs.ext4 /dev/mapper/OVM_SYS_REPO_PART_
VBd64a21cf-db4a5ad5
If you don't have a partition mapped like this, you may need to use a utilities like blkls, parted, fdisk or gdisk to identify any free partitions on your available disk devices.
Obtain the UUID for the filesystem. You can do this by running the blkid command:
# blkid /dev/mapper/OVM_SYS_REPO_PART_
VBd64a21cf-db4a5ad5
/dev/mapper/OVM_SYS_REPO_PART_VBd64a21cf-db4a5ad5
: UUID="51216552-2807-4f17-ab27-b8135f69896d
" TYPE="ext4"Take note of the UUID as you will need to use this later when you configure kdump.
System configuration directing how the kdump service runs is
defined in /etc/sysconfig/kdump
, while
specific kdump configuration variables are defined in
/etc/kdump.conf
. Changes may need to be made
to either of these files depending on your environment. However,
the default configuration should be sufficient to run kdump
initially without any problems. The following list identifies
potential configuration changes that you may wish to make:
On systems with lots of memory (e.g. over 1 TB), it is advisable to disable the IO Memory Management Unit within the crash kernel for performance and stability reasons. This is achieved by editing
/etc/sysconfig/kdump
and appending theiommu=off
kernel boot parameter to theKDUMP_COMMANDLINE_APPEND
variable:KDUMP_COMMANDLINE_APPEND="irqpoll maxcpus=1 nr_cpus=1 reset_devices cgroup_disable=memory mce=off selinux=0 iommu=off"
If you intend to change the partition where the vmcore files are stored, using the spare partition on the server after installation for instance, you must edit
/etc/kdump.conf
to provide the filesystem type and device location of the partition. If you followed the instructions above, it is preferable that you do this by specifying the UUID that you obtained for the partition using the blkid command. A line similar to the following should appear in the configuration:ext4 UUID=
51216552-2807-4f17-ab27-b8135f69896d
You may edit the default path where vmcore files are stored, but note that this path is relative to the partition that kdump is configured to use to store vmcores. If you have configured kdump to store vmcores on a separate filesystem, when you mount the filesystem, the vmcore files are located in the path specified by this directive on the mounted filesystem:
path /var/crash
If you are having issues obtaining a vmcore or you are finding that your vmcore files are particularly large using the makedumpfile utility, you may reconfigure kdump to use the cp command to copy the vmcore in sparse mode. To do this, edit
/etc/kdump.conf
to comment out the line containing setting thecore_collector
to use the makedumpfile utility and uncomment the lines to enable the cp command:# core_collector makedumpfile -EXd 1 --message-level 1 --non-cyclic core_collector cp --sparse=always extra_bins /bin/cp
Your mileage with this may vary, and the makedumpfile utility is generally recommended instead.
You can enable the kdump service to run at every boot by running the following command:
# chkconfig kdump on
You must restart the kdump service at this point to allow it to detect the changes that have been made to the kdump configuration and to determine whether a kdump crash kernel has been generated and is up to date. If the kernel image needs to be updated, kdump does this automatically, otherwise it restarts without any attempt to rebuild the crash kernel image:
# service kdump restart Stopping kdump: [ OK ] Detected change(s) the following file(s): /etc/kdump.conf Rebuilding /boot/initrd-4.1.12-25.el6uek.x86_64kdump.img Starting kdump: [ OK ]
You can confirm that the kernel loaded for dom0 is correctly configured, by running the following command and checking that output is returned to show that your crashkernel parameter is in use:
# xl dmesg|grep -i crashkernel (XEN) Command line: placeholder dom0_mem=max:6144M allowsuperpage dom0_vcpus_pin dom0_max_vcpus=20 crashkernel=512M@64M
You can also check that the appropriate amount of memory is reserved for kdump by running the following:
# xl dmesg|grep -i kdump (XEN) Kdump: 512MB (524288kB) at 0x4000000
or alternately:
# kexec --print-ckr-size 536870912
You can check that the kdump service is running by checking the service status:
# service kdump status Kdump is operational
If there are no errors in /var/log/messages
or on the console, you can assume that kdump is running correctly.
To test that kdump is able to generate a vmcore and store it correctly, you can trigger a kernel panic by issuing the following commands:
# echo 1 > /proc/sys/kernel/sysrq # echo c > /proc/sysrq-trigger
These commands cause the kernel on the Oracle VM Server to panic and crash. If kdump is working correctly, the crash kernel should take over and generate the vmcore file which is copied to the configured location before the server reboots automatically. If kdump fails to load the crash kernel, the server may hang with the kernel panic and requires a hard-reset to reboot.
After you have triggered a kernel panic and the system has successfully rebooted, you may check that the vmcore file was properly generated:
If you have not configured kdump to use an alternate partition, you should be able to locate the vmcore file in
/var/crash/127.0.0.1-
, wheredate
-time
/vmcoredate
andtime
represent the date and time when the vmcore was generated.If you configured kdump to use an alternate partition to store the vmcore file, you must mount it first. If you used the spare partition generated by a fresh installation of Oracle VM Server, this can be done in the following way:
# mount /dev/mapper/OVM_SYS_REPO_PART_
VBd64a21cf-db4a5ad5
/mntYou may then find the vmcore file in
/mnt/var/crash/127.0.0.1-
, wheredate
-time
/vmcoredate
andtime
represent the date and time when the vmcore was generated, for example:# file /mnt/var/crash/127.0.0.1-2015-12-08-16\:12\:28/vmcore /mnt/var/crash/127.0.0.1-2015-12-08-16:12:28/vmcore: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style
Remember to unmount the partition after you have obtained the vmcore file for analysis, so that it is free for use by kdump.
If you find that a vmcore file is not being created or that the system hangs without automatically rebooting, you may need to adjust your configuration. The most common problem is that there is insufficient memory allocated for the crash kernel to run and complete its operations. Your starting point to resolving issues with kdump is always to try increasing the reserved memory that is specified in your GRUB2 configuration.