Avoiding Data Loss on a Test System

Language:

A driver bug can sometimes render a system incapable of booting. By taking precautions, you can avoid system reinstallation in this event, as described in this section.

Using an Alternate Boot Environment

A number of driver-related system files are difficult, if not impossible, to reconstruct. Files such as /etc/name_to_major, /etc/driver_aliases, /etc/driver_classes, and /etc/minor_perm can be corrupted if the driver crashes the system during installation. See the add_drv(1M) man page.

To be safe, use the beadm(1M) command to make a backup copy of the root file system after the test system is in the proper configuration. If you plan to modify the /etc/system file, make a backup copy of the file before making modifications.

Booting With an Alternate Kernel

See Chapter 4, Administering Boot Environments in Creating and Administering Oracle Solaris 11.3 Boot Environments and Booting From a ZFS Root File System in Managing ZFS File Systems in Oracle Solaris 11.3 for detailed information.

Consider Alternative Back-Up Plans

If the system is attached to a network, the test system can be added as a client of a server. If a problem occurs, the system can be booted from the network. The local disks can then be mounted, and any fixes can be made. Alternatively, the system can be booted directly from the Oracle Solaris system CD-ROM.

Another way to recover from disaster is to have another bootable root file system. Use format(1M) to make a partition that is the exact size of the original. Then use dd(1M) to copy the bootable root file system. After making a copy, run fsck(1M) on the new file system to ensure its integrity.

Subsequently, if the system cannot boot from the original root partition, boot the backup partition. Use dd(1M) to copy the backup partition onto the original partition. You might have a situation where the system cannot boot even though the root file system is undamaged. For example, the damage might be limited to the boot block or the boot program. In such a case, you can boot from the backup partition with the ask (–a) option. You can then specify the original file system as the root file system.

Capture System Crash Dumps

When a system panics, the system writes an image of kernel memory to the dump device. The dump device is by default the most suitable swap device. The dump is a system crash dump, similar to core dumps generated by applications. On rebooting after a panic, savecore(1M) checks the dump device for a crash dump. If a dump is found, savecore makes a copy of the kernel's symbol table, which is called unix.n. The savecore utility then dumps a core file that is called vmcore.n in the core image directory. By default, the core image directory is /var/crash/machine_name. If /var/crash has insufficient space for a core dump, the system displays the needed space but does not actually save the dump. The mdb(1) debugger can then be used on the core dump and the saved kernel.

In the Oracle Solaris operating system, crash dump is enabled by default. The dumpadm(1M) command is used to configure system crash dumps. Use the dumpadm command to verify that crash dumps are enabled and to determine the location of core files that have been saved.

Note - You can prevent the savecore utility from filling the file system. Add a file that is named minfree to the directory in which the dumps are to be saved. In this file, specify the number of kilobytes to remain free after savecore has run. If insufficient space is available, the core file is not saved.