Go to main content

Troubleshooting System Administration Issues in Oracle® Solaris 11.3

Exit Print View

Updated: October 2017
 
 

About System Crashes

System crashes can occur due to hardware malfunctions, I/O problems, and software errors. If the system crashes, it will display an error message on the console and it preserves a copy of its physical memory in RAM or write a copy of its physical memory to the dump device. The system will then reboot automatically. When the system reboots, the savecore command is executed to retrieve the data from memory or from the dump device and write the saved crash dump files to your savecore directory. These saved files provide invaluable information to aid in diagnosing the problem.


Note -  The term "crash dump" refers to the overall result of this process, including the set of crash dump files, where they are located, and how these files are organized and formatted.

System Crash Dump Files

The savecore command runs automatically after a system crash to retrieve the crash dump information from memory or from the dump device and writes the information into a set of files. Afterwards, the savecore command can be invoked on the same system or another system to expand the compressed crash dump files.


Note -  Crash dump files are sometimes confused with core files, which are images of user applications that are written when the application terminates abnormally.

Crash dump files are saved in the /var/crash/ directory, which is a predetermined default directory. The saving of crash dump files is enabled by default.

Restructured Files

Beginning with the Oracle Solaris 11.2 release, the contents of kernel crash dump files are divided into multiple new files based on their contents. This method enables more granularity in configuration so the files can more easily be accessed and studied.

The crash dump information is written to the set of vmdump-section.n files. The section value is the name of a file section that contains a specific kind of dump information. The n value is an integer which increments every time savecore is run to copy a crash dump and a new crash dump is found on the dump device. Possible files include:

  • vmdump-proc.n – Dump file with compressed process pages

  • vmdump-zfs.n– Dump file with compressed ZFS metadata

  • vmdump-other.n – Dump file with other pages

Kernel crash dumps were previously stored in vmdump.n, unix.n, and vmcore.n.

The vmdump.n and vmcore files store kernel pages metadata and kernel pages data in compressed or uncompressed form, respectively.

For further information, see the dumpadm(1M) and savecore(1M) man pages.

dumpadm and savecore Commands

The dumpadm and savecore utilities configure and manage the creation of a crash dump as follows:

  1. During system startup, the dumpadm command is invoked by the svc:/system/dumpadm:default service to configure crash dump parameters. It initializes the dump device and the dump content through the /dev/dump interface.

  2. After the dump configuration is complete, savecore is invoked. It checks for crash dumps either in the RAM or in the dump device, and checks the content of the minfree file in the crash dump directory. System crash dump files generated by the savecore command are saved by default.

  3. Dump data is stored in a compressed format on the dump device. Kernel crash dump images can be as large as 128GB or more. Compressing the data means faster dumping and less disk space required for the dump device.

  4. By default, the installer will create a dedicated dump service. The system will then wait for the savecore command to complete before going on to the next step. On large memory systems, the system can become available before savecore completes.

The savecore –L command enables an administrator to get a crash dump of a system currently running the Oracle Solaris OS. This command is intended for troubleshooting a running system by taking an image of memory during some bad state, such as a transient performance problem or service outage. Note that this image of memory is imperfect due to changes that occur while the system is running. If the system is up and you can still run some commands, you can execute the savecore –L command to save the image of the system to the dump device and then immediately write out the crash dump files to your savecore directory. You can use the savecore –L command only if you have configured a dedicated dump device.