This appendix describes best practices and troubleshooting, including:
Repairing a service instance that is degraded, offline, or in maintenance
Diagnosing and repairing SMF repository problems
Specifying the amount of SMF startup messaging
Specifying the SMF milestone to which to boot
Investigating system boot problems
Converting inetd services to SMF services
Most services describe configuration policy. If the configuration you want is not implemented, modify the policy description by modifying the service. Modify the values of service properties or create new service instances with different property values. Do not disable service instances and edit configuration files that are intended to be managed by an SMF service. An increasing number of fundamental Oracle Solaris features are configured by SMF service properties, not by editing configuration files.
Do not modify manifests and system profiles that are delivered by Oracle or by third-party software vendors. These manifests and profiles might be replaced when you upgrade your system, and then your changes to these files will be lost. Instead, do one of the following:
Add a new service instance with different property values as described in Adding Service Instances.
Create a profile to customize the service. Use the svcbundle command or the svccfg extract command to create a profile file. Customize property values in that file, and include comments about the reason for each customization. Copy the profile file to /etc/svc/profile/site, and restart the manifest-import service.
To apply the same custom configuration to multiple systems, copy the same profile file to /etc/svc/profile/site on each system, and restart the manifest-import service on each system. To automate delivering the profile to each system, package the profile. See Configuring Multiple Systems.
Use the svccfg command or the inetadm command to manipulate the properties directly. If you use the svccfg command to modify property values, be sure to refresh the service instance as explained in Understanding Configuration Changes. For information about modifying, adding, and deleting service configuration, see Configuring Services. To see configuration that has already been modified, see Showing Configuration Customizations. To delete custom configuration, see Example 33, Deleting Customizations and Example 35, Unmasking Configuration.
When you create a site profile, make sure the configuration defined does not conflict with configuration defined in another site profile for the same service or service instance. Configuration conflicts are not permitted within any layer. If conflicting configuration is delivered by multiple files in any single layer, and is not set at a higher layer, the manifest-import service log reports the conflict and the service with the conflicting configuration is not started. See Conflicting Configuration for more information.
Do not use non-standard locations for manifest and profile files. See Service Bundles for manifest and profile standard locations.
When you create a service for your own use, use site at the beginning of the service name: svc:/site/service_name:instance_name.
Do not modify the configuration of the master restarter service, svc:/system/svc/restarter:default, except to configure logging levels as described in Specifying the Amount of Startup Messaging.
Before you use the svccfg delcust command, use the svccfg listcust command with the same options. The delcust subcommand can potentially remove all administrative customizations on a service. Use the listcust subcommand to verify which customizations will be deleted by the delcust subcommand.
In scripts, use the full service instance FMRI: svc:/service_name:instance_name.
This section discusses the following topics:
Committing configuration changes into the running snapshot
Fixing services that are reported to have problems
Manually transitioning an instance to the degraded or maintenance state
Fixing a corrupt service configuration repository
Configuring the amount of messaging to display or store on system startup
Transitioning or booting to a specified milestone
Using SMF to investigate booting problems
Converting inetd services to SMF services
In the service configuration repository, SMF stores property changes separately from properties in the running snapshot. When you change service configuration, those changes do not immediately appear in the running snapshot.
The refresh operation updates the running snapshot of the specified service instance with the values from the editing configuration.
By default, the svcprop command shows properties in the running snapshot, and the svccfg command shows properties in the editing configuration. If you have changed property values but not performed a configuration refresh, the svcprop and svccfg commands show different property values. After you perform a configuration refresh, the svcprop and svccfg commands show the same property values.
Rebooting does not change the running snapshot. The svcadm restart command does not refresh configuration. Use the svcadm refresh or svccfg refresh command to commit configuration changes into the running snapshot.
Use the svcs -x command with no arguments to display explanatory information about any service instances that match any of the following descriptions:
The service is enabled but is not running.
The service is enabled but is not running at normal capacity.
The service is preventing another enabled service from running.
The service is disabled but is not able to complete the transition to the disabled state.
The following list summarizes how to approach service problems:
Diagnose the problem, starting with viewing the service log file.
Log files located in /var/svc/log and /system/volatile. The service log file shows time stamps and method exit reasons.
The location of the log file for a particular service is given by the following command:
$ svcs -L service-name
The following command displays the end of the log file for a particular service:
$ svcs -Lx service-name
Fix the problem.
If multiple service failures are identified, start by looking at the first failure to occur, using the time stamps in the service log files.
Use the following command to show impacted dependencies of the failed service:
$ svcs -l service-name
Use the following command to show services on which service-name depends:
$ svcs -d service-name
If fixing the problem involves modifying service configuration, refresh the service.
Move affected services to a running state.
The instance might be transitioning through the maintenance state because an administrative action has not yet completed. If the instance is transitioning, its state should be shown as maintenance* with an asterisk at the end.
An instance that is configured to restart after failure might be placed in maintenance because it was restarting too frequently. In this case you need to determine the cause of the consistent failure.
If an instance is in maintenance because it has conflicts, or conflicting property values, see Conflicting Configuration.
The instance might be in the maintenance state because the instance was disabled but is unable to reach the disabled state because the stop method failed.
In the following example, the "State" and "Reason" lines show that the pkg/depot service is in the maintenance state because its start method failed.
$ svcs -x svc:/application/pkg/depot:default (IPS Depot) State: maintenance since September 11, 2013 01:30:42 PM PDT Reason: Start method exited with $SMF_EXIT_ERR_FATAL. See: http://support.oracle.com/msg/SMF-8000-KS See: pkg.depot-config(1M) See: /var/svc/log/application-pkg-depot:default.log Impact: This service is not running.
Log in to the Oracle support site to view the referenced Predictive Self-Healing knowledge article. In this case, the article tells you to view the log file to determine why the start method failed. The svcs output gives the name of the log file. See Viewing Service Log Files for information about how to view the log file. In this example, the log file shows the start method invocation and the fatal error message.
[ Sep 11 13:30:42 Executing start method ("/lib/svc/method/svc-pkg-depot start"). ] pkg.depot-config: Unable to get publisher information: The path '/export/ipsrepos/Solaris11' does not contain a valid package repository.
One or more of the following steps might be needed.
If fixing the reported problem required modifying service configuration, use the svccfg refresh or svcadm refresh command for any services whose configuration changed. Verify that the configuration is updated in the running snapshot by using the svcprop command to check property values or by other tests specific to this service.
Sometimes the "Impact" line in the svcs -x output tells you that services that depend on the service that is in the maintenance state are not running. Use the svcs -l command to check the current state of dependent services. Ensure that all required dependencies are running. Use the svcs -x command to verify that all enabled services are running.
If the service that is in the maintenance state is a contract service, determine whether any processes that were started by the service have not stopped. When a contract service instance is in a maintenance state, the contract ID should be blank, as shown in the following example, and all processes associated with that contract should have stopped. Use svcs -l or svcs -o ctid to check that no contract exists for a service instance in maintenance. Use svcs -p to check whether any processes associated with this service instance are still running. Any processes shown by svcs -p for a service instance in maintenance should be killed.
$ svcs -l system-repository fmri svc:/application/pkg/system-repository:default name IPS System Repository enabled true state maintenance next_state none state_time September 17, 2013 07:18:19 AM PDT logfile /var/svc/log/application-pkg-system-repository:default.log restarter svc:/system/svc/restarter:default contract_id manifest /lib/svc/manifest/application/pkg/pkg-system-repository.xml dependency require_all/error svc:/milestone/network:default (online) dependency require_all/none svc:/system/filesystem/local:default (online) dependency optional_all/error svc:/system/filesystem/autofs:default (online)
When the reported problem is fixed, use the svcadm clear command to tell the restarter for that service that the instance is repaired. SMF will attempt to transition the instance to its configured state. If the instance is enabled, SMF will attempt to bring the instance online. If the instance is disabled, SMF will transition the instance to the disabled state.
$ svcadm clear pkg/depot:default
If you specify the -s option, the svcadm command waits to return until the instance reaches the online state or until it determines that the instance cannot reach the online state without administrator intervention. Use the -T option with the -s option to specify an upper bound in seconds to make the transition or determine that the transition cannot be made.
Use the svcs command to verify that the service that was in maintenance is now online. Use the svcs -x command to verify that all enabled services are running.
The instance might be transitioning through the offline state because its dependencies are not yet satisfied. If the instance is transitioning, its state should be shown as offline*.
If required dependencies are disabled, enable them with the following command:
$ svcadm enable -r FMRI
A dependency file might be missing or unreadable. You might want to use pkg fix or pkg revert to fix this type of problem. See the pkg(1) man page.
If the instance was offline because a required dependency was not satisfied, fixing or enabling the dependency might cause the offline instance to restart and come online with no further administrative action needed.
If you made some other fix to the service, then restart the instance.
$ svcadm restart FMRI
Use the svcs command to verify that the instance that was offline is now online. Use the svcs -x command to verify that all enabled services are running.
When the reported problem is fixed, use the svcadm clear command to return the instance to the online state. For instances in the degraded state, the clear subcommand requests that the restarter for that instance transition the instance to the online state.
$ svcadm clear pkg/depot:default
Use the svcs command to verify that the instance that was degraded is now online. Use the svcs -x command to verify that all enabled services are running.
You can mark a service instance as being in either the degraded state or the maintenance state. You might want to do this if the application is stuck in a loop or is deadlocked, for example. The information about the state change propagates to the dependencies of the marked instance, which can help debug other related instances.
Specify the -I option to request an immediate state change.
When you mark an instance as maintenance, you can specify the -t option to request a temporary state change. Temporary requests last only until reboot.
If you specify the -s option with the svcadm mark command, svcadm marks the instance and waits for the instance to enter the degraded, or maintenance state before returning. Use the -T option with the -s option to specify an upper bound in seconds to make the transition or determine that the transition cannot be made.
On system startup, the repository daemon, svc.configd, performs an integrity check of the configuration repository stored in /etc/svc/repository.db. If the svc.configd integrity check fails, the svc.configd daemon writes a message to the console similar to the following:
svc.configd: smf(5) database integrity check of: /etc/svc/repository.db failed. The database might be damaged or a media error might have prevented it from being verified. Additional information useful to your service provider is in: /system/volatile/db_errors The system will not be able to boot until you have restored a working database. svc.startd(1M) will provide a sulogin(1M) prompt for recovery purposes. The command: /lib/svc/bin/restore_repository can be run to restore a backup version of your repository. See http://support.oracle.com/msg/SMF-8000-MY for more information.
At the sulogin prompt, enter Ctrl-D to exit sulogin. The svc.startd daemon recognizes the sulogin exit and restarts the svc.configd daemon, which checks the repository again. The problem might not reappear after this additional restart.
Caution - Do not directly invoke the svc.configd daemon. The svc.startd daemon starts the svc.configd daemon.
If svc.configd again reports a failed integrity check and you are again at the sulogin prompt, ensure that required file systems are not full. Using the root password, log in either remotely or at the sulogin prompt. Check that space is available on both the root and system/volatile file systems. If either of these file systems is full, clean up and start the system again. If neither of these file systems is full, follow the procedure How to Restore a Repository From Backup.
The service configuration repository can become corrupted for any of the following reasons:
Accidental overwrite of the file
The following procedure shows how to replace a corrupt repository with a backup copy of the repository.
Before You Begin
Caution - Only restore a corrupt repository. Do not use this repository restore procedure to delete unwanted configuration changes. To undo configuration changes, see Showing Configuration Customizations, Example 33, Deleting Customizations, and Example 35, Unmasking Configuration.
Using the root password, log in either remotely or at the sulogin prompt.
Running this command takes you through the necessary steps to restore a non-corrupt backup. SMF automatically takes backups of the repository as described in Repository Backups.
SMF maintains persistent and non-persistent configuration data. See Service Configuration Repository for descriptions of these two repositories. The restore_repository command only restores the persistent repository. The restore_repository command also reboots the system, which destroys the non-persistent configuration data. The non-persistent data is runtime data that is not needed across system reboot.
When started, the /lib/svc/bin/restore_repository command displays a message similar to the following:
See http://support.oracle.com/msg/SMF-8000-MY for more information on the use of this script to restore backup copies of the smf(5) repository. If there are any problems which need human intervention, this script will give instructions and then exit back to your shell.
After the root ( /) file system is mounted with write permissions, or if the system is a local zone, you are prompted to select the repository backup to restore:
The following backups of /etc/svc/repository.db exists, from oldest to newest: ... list of backups ...
Backups are given names, based on type and the time the backup was taken. Backups beginning with boot are completed before the first change is made to the repository after system boot. Backups beginning with manifest_import are completed after svc:/system/manifest-import:default finishes its process. The time of the backup is given in YYYYMMDD_HHMMSS format.
Typically, the most recent backup option is selected.
Please enter either a specific backup repository from the above list to restore it, or one of the following choices: CHOICE ACTION ---------------- ---------------------------------------------- boot restore the most recent post-boot backup manifest_import restore the most recent manifest_import backup -seed- restore the initial starting repository (All customizations will be lost, including those made by the install/upgrade process.) -quit- cancel script and quit Enter response [boot]:
If you press Enter without specifying a backup to restore, the default response, enclosed in  is selected. Selecting -quit- exits the restore_repository script, returning you to your shell prompt.
Caution - Selecting -seed- restores the seed repository. This repository is designed for use during initial installation and upgrades. Only use the seed repository for recovery purposes when no other service configuration change or backup service repository will work. All configuration changes will be lost, including changes to fundamental Oracle Solaris features that were delivered by installing or updating packages. Using the seed repository for recovery purposes should be a last resort.
After you have selected the backup that you want to restore, that backup is validated and its integrity is checked. If any problems are discovered, the restore_repository command prints error messages and prompts you for another selection. Once you have selected a valid backup, the following information is printed, and you are prompted for final confirmation.
After confirmation, the following steps will be taken: svc.startd(1M) and svc.configd(1M) will be quiesced, if running. /etc/svc/repository.db -- renamed --> /etc/svc/repository.db_old_YYYYMMDD_HHMMSS /system/volatile/db_errors -- copied --> /etc/svc/repository.db_old_YYYYMMDD_HHMMSS_errors repository_to_restore -- copied --> /etc/svc/repository.db and the system will be rebooted with reboot(1M). Proceed [yes/no]?
The system reboots after the restore_repository command executes all of the listed actions.
By default, each service that starts during system boot does not display a message on the console. Use one of the following methods to change which messages appear on the console and which are recorded only in the svc.startd log file. The value of logging-level can be one of the values shown in the table below.
When booting a SPARC system, specify the -m option to the boot command at the ok prompt. See "Messages options" in the kernel(1M) man page.
ok boot -m logging-level
When booting an x86 system, edit the GRUB menu to specify the -m option. See Adding Kernel Arguments by Editing the GRUB Menu at Boot Time in Booting and Shutting Down Oracle Solaris 11.3 Systems and "Messages options" in the kernel(1M) man page.
Prior to rebooting a system, use the svccfg command to change the value of the options/logging property. If this property has never been changed on this system, then it will not exit and you will have to add it. The following example changes to verbose messaging. The change takes effect on the next restart of the svc.startd daemon.
$ svccfg -s system/svc/restarter:default listprop options/logging $ svccfg -s system/svc/restarter:default addpg options application $ svccfg -s system/svc/restarter:default setprop options/logging=verbose $ svccfg -s system/svc/restarter:default listprop options/logging options/logging astring verbose
When you boot a system, you can specify the SMF milestone to which to boot.
By default, all services for which the value of the general/enabled property is true are started at system boot. To change the milestone to which to boot a system, use one of the following methods. The value of milestone can be the FMRI of a milestone service or a keyword as shown in Figure 3, Table 3, SMF Boot Milestones and Corresponding Run Levels.
When booting a SPARC system, specify the -m option to the boot command at the ok prompt. See the -m option in the kernel(1M) man page.
ok boot -m milestone=milestone
When booting an x86 system, edit the GRUB menu to specify the -m option. See Adding Kernel Arguments by Editing the GRUB Menu at Boot Time in Booting and Shutting Down Oracle Solaris 11.3 Systems and the -m option in the kernel(1M) man page.
Prior to rebooting a system, use the svcadm milestone command with the -d option. Note that with or without the -d option, this command restricts and restores running services immediately. With the -d option, the command also makes the specified milestone the default boot milestone. This new default is persistent across reboots.
$ svcadm milestone -d milestone
This command does not change the current run level of the system. To change the current run level of the system, use the init command.
If you specify the -s option, svcadm changes the milestone and then waits for the transition to the specified milestone to complete before returning. The svcadm command returns when all instances have transitioned to the state necessary to reach the specified milestone or when it determines that administrator intervention is required to make a transition. Use the -T option with the -s option to specify an upper bound in seconds to complete the milestone change operation or return.
The following table describes SMF boot milestones, including any corresponding Oracle Solaris run level. A system’s run level defines what services and resources are available to users. A system can be in only one run level at a time. For information about run levels,see How Run Levels Work in Booting and Shutting Down Oracle Solaris 11.3 Systems, the inittab(4) man page, and the /etc/init.d/README file. For more information about SMF boot milestones, see the milestone subcommand in the svcadm(1M) man page.
$ svcs 'milestone/*' STATE STIME FMRI online 9:08:05 svc:/milestone/unconfig:default online 9:08:06 svc:/milestone/config:default online 9:08:07 svc:/milestone/devices:default online 9:08:25 svc:/milestone/network:default online 9:08:31 svc:/milestone/single-user:default online 9:08:51 svc:/milestone/name-services:default online 9:09:13 svc:/milestone/self-assembly-complete:default online 9:09:23 svc:/milestone/multi-user:default online 9:09:24 svc:/milestone/multi-user-server:default
This section describes actions to take if your system hangs during boot or if a key service fails to start during boot.
If problems occur when starting services at system boot, sometimes the system will hang during boot. This procedure shows how to investigate services problems that occur at boot time.
The following command instructs the svc.startd daemon to temporarily disable all services and start sulogin on the console.
ok boot -m milestone=none
See Specifying the SMF Milestone to Which to Boot for a list of SMF milestones that you can use with the boot -m command.
# svcadm milestone all
When the boot process hangs, determine which services are not running by running svcs -a. Look for error messages in the log files in /var/svc/log.
# svcs -x
This command verifies that the login process on the console will run.
# svcs -l system/console-login:default
Local file systems that are not required to boot the system are mounted by the svc:/system/filesystem/local:default service. When any of those file systems cannot be mounted, the filesystem/local service enters a maintenance state. System startup continues, and any services that do not depend on filesystem/local are started. Services that have a required dependency on the filesystem/local service are not started.
This procedure explains how to change the configuration of the system so that a sulogin prompt appears immediately after the service fails instead of allowing system startup to continue.
$ svccfg -s svc:/system/console-login svc:/system/console-login> addpg site,filesystem-local dependency svc:/system/console-login> setprop site,filesystem-local/entities = fmri: svc:/system/filesystem/local svc:/system/console-login> setprop site,filesystem-local/grouping = astring: require_all svc:/system/console-login> setprop site,filesystem-local/restart_on = astring: none svc:/system/console-login> setprop site,filesystem-local/type = astring: service svc:/system/console-login> end
$ svcadm refresh console-login
When a failure occurs with the system/filesystem/local:default service, use the svcs -vx command to identify the failure. After the failure has been fixed, use the following command to clear the error state and allow the system boot to continue:
$ svcadm clear filesystem/local