C H A P T E R 1 |
Introduction to System Management Services |
This manual describes the System Management Services (SMS) 1.4 software that is available with the Sun Fire high-end server system.
This chapter includes the following sections:
The system controller (SC) in Sun Fire high-end systems is a multifunction, CP1500- or CP2140-based printed circuit board (PCB) that provides critical services and resources required for the operation and control of the Sun Fire system.
A Sun Fire high-end system is often referred to as the platform. System boards within the platform can be logically grouped together into separately bootable systems called dynamic system domains, or simply domains.
Up to 18 domains on the Sun Fire 15K/E25K, and up to 9 domains on the Sun Fire 12K/E20K can exist simultaneously on a single platform. (Domains are introduced in this chapter, and are described in more detail in SMS Configuration.) The system management services (SMS) software lets you control and monitor domains, as well as the platform itself.
The following list is an overview of the many services the SC provides for the Sun Fire system:
Manages the overall system configuration.
Acts as a boot initiator for its domains.
Serves as the syslog host for its domains; note that an SC can still be a syslog client of a LAN-wide syslog host.
Provides a synchronized hardware clock source.
Sets up and configures dynamic domains.
Monitors system environmental information, such as power supply, fan, and temperature status.
Hosts field-replaceable unit (FRU) logging data.
Provides redundancy and automated SC failover in dual SC configurations.
Provides a default name service for the domains based on virtual hostids, and MAC addresses for the domains.
Provides administrative roles for platform management.
There are two SCs within Sun Fire platform. The SC that controls the platform is referred to as the main SC, while the other SC acts as a backup and is called the spare SC. The software running on the SC monitors the SCs to determine when an automatic failover should be performed.
We strongly recommend that the two SCs have the same configuration. This duplication includes the Solaris operating environment, SMS software, security modifications, patch installations, and all other system configurations.
The failover functionality between the SCs is controlled by the daemons running on the main and spare SCs. These daemons communicate across private communication paths built into the Sun Fire platform. Other than the communication of these daemons, there is no special trust relationship between the two SCs.
SMS software packages are installed on the SC. In addition, SMS communicates with the Sun Fire high-end system over an Ethernet connection, see Management Network Services.
SMS 1.4.1 cannot communicate with SMS 1.3 across the I2 network. If one of the SC's is running SMS 1.3 and the other is running SMS 1.4.1, the I2 network tests will fail, and the SC's will communicate through HASRAM. For information about the I2 network, see I2 Network.
SMS 1.4.1 supports Sun Fire high-end servers running the Solaris 8 update 7 or Solaris 9 04/04 operating environments.
SMS 1.4.1 is compatible with Sun Fire high-end system domains that are running the Solaris 8 2/02 and Solaris 9 4/04 operating environments. The commands provided with the SMS software can be used remotely.
Note - Graphical user interfaces for many of the commands in SMS are provided by Sun Management Center. For more information, see Sun Management Center. |
SMS enables the platform administrator to perform the following tasks:
Administer domains by logically grouping domain configurable units (DCU) together. DCUs are system boards such as CPU and I/O boards. Domains are able to run their own operating systems and handle their own workloads. See SMS Configuration.
Dynamically reconfigure a domain so that currently installed system boards can be logically attached to or detached from the operating system while the domain continues running in multiuser mode. This feature is known as dynamic reconfiguration and is described in the System Management Services (SMS) 1.4 Dynamic Reconfiguration User Guide (A system board can be physically swapped in and out when it is not attached to a domain, while the system continues running in multiuser mode.)
Perform automatic dynamic reconfiguration of domains using a script. Refer to the System Management Services (SMS) 1.4 Dynamic Reconfiguration User Guide.
Monitor and display the temperatures, currents, and voltage levels of one or more system boards or domains.
Monitor and control power to the components within a platform.
Execute diagnostic programs such as power-on self-test (POST).
Warns you of impending problems, such as high temperatures or malfunctioning power supplies.
Notifies you when a software error or failure has occurred.
Monitors a dual SC configuration for single points of failure and performs an automatic failover from the main SC to the spare depending on the failure condition detected.
Automatically reboots a domain after a system software failure (such as a panic).
Keeps logs of interactions between the SC environment and the domains.
Provides support for the Sun Fire high-end system dual grid power option.
SMS enables the domain administrator to perform the following tasks:
Administrate domains by logically grouping domain configurable units (DCU) together. DCUs are system boards such as: CPU and I/O boards. Domains are able to run their own operating systems and handle their own workloads. See SMS Configuration.
Boot domains for which the administrator has privileges.
Dynamically reconfigure a domain for which the administrator has privileges, so that currently installed system boards can be logically attached to or detached from the operating system while the domain continues running in multiuser mode. This feature is known as dynamic reconfiguration and is described in the System Management Services (SMS) 1.4 Dynamic Reconfiguration User Guide. (A system board can be physically swapped in and out when it is not attached to a domain, while the system continues running in multiuser mode.)
Perform automatic dynamic reconfiguration of domains using a script for which the administrator has privileges. Refer to the System Management Services (SMS) 1.4 Dynamic Reconfiguration User Guide.
Monitor and display the temperatures, currents, and voltage levels of one or more system boards or domains for which the administrator has privileges.
Execute diagnostic programs such as power-on self-test (POST) for which the administrator has privileges.
The following features are provided in this release of SMS:
Dynamic system domain (DSD) configuration
Configured domain services
Domain control capabilities
Automatic diagnosis and domain recovery
Capacity on demand (COD)
Domain status reporting
Hardware control capabilities
Hardware status monitoring, reporting, and handling
Hardware error monitoring, reporting, and handling
System controller (SC) failover
Configurable administrative privileges
Dynamic FRUID
SMS architecture is best described as distributed client-server. init(1M) starts (and restarts as necessary) one process: ssd(1M). ssd is responsible for monitoring all other SMS processes and restarting them as necessary. See FIGURE 3-1.
The Sun Fire high-end systems platform, the SC, and other workstations communicate over Ethernet. You perform SMS operations by entering commands on the SC console after remotely logging in to the SC from another workstation on the local area network. You must log in as a user with the appropriate platform or domain privileges if you want to perform SMS operations (such as monitoring and controlling the platform).
Dual system controllers are supported within the Sun Fire high-end systems platform. One SC is designated as the primary or main system controller, and the other is designated as the spare system controller. If the main SC fails, the failover capability automatically switches to the spare SC as described in SC Failover.
Most domain configurable units are active components and you need to check the system state before powering off any DCU.
Note - Circuit breakers must be on whenever a board is present, including expander boards, whether or not the board is powered on. |
For details, see Power Control.
Administration tasks on the Sun Fire high-end system are secured by group privilege requirements. Upon installation, SMS installs the following 39 UNIX groups to the /etc/group file.
platadmn - Platform administrator
platoper - Platform operator
platsvc - Platform service
dmn[A...R]admn - domain [domain_id|domain_tag] administrator (18)
dmn[A...R]rcfg - domain [domain_id|domain_tag] configurator (18)
smsconfig(1M) allows an administrator to add, remove, and list members of platform and domain groups as well as set platform and domain directory privileges using the -a, -r, and -l options.
smsconfig also can configure SMS to use alternate group names including NIS managed groups using the -g option. Group information entries can come from any of the sources for groups specified in the/etc/nsswitch.conf file (refer to nsswitch.conf(4)). For instance, if domain A was known by its domain tag as the "Production Domain," an administrator could create a NIS group with the same name and configure SMS to use this group as the domain A administrator group instead of the default, dmnaadmn. For more information, refer to the System Management Services (SMS) 1.4.1 Installation Guide, Administration Privileges, and refer to the smsconfig man page.
The nature of the Sun Fire high-end systems physical architecture, with an embedded system controller, as well as the supported administrative model (with multiple administrative privileges, and hence multiple administrators) dictates that an administrator utilize a remote network connection from a workstation to access SMS command interfaces to manage the Sun Fire high-end system.
Since the administrators provide information to verify their identity (passwords) and might possibly need to display sensitive data, it is important that the remote network connection be secure. Physical separation of the administrative networks provides some security on the Sun Fire high-end system. Multiple external physical network connections are available on each SC. SMS software supports up to two external network communities.
For more information on Sun Fire high-end system networks, see Management Network Services. For more information on securing the Sun Fire high-end system see Security Options.
You can interact with the SC and the domains on the Sun Fire high-end system by using SMS commands.
SMS provides a command-line interface to the various functions and features it contains.
For the examples in this guide, the sc_name is sc0 and sms-user is the user-name of the administrator, operator, configurator, or service personnel logged onto the system.
The privileges allotted to the user are determined by the platform or domain groups to which the user belongs. In these examples, the sms-user is assumed to have both platform and domain administrator privileges, unless otherwise noted.
For more information on the function and creation of SMS user groups, refer to the System Management Services (SMS) 1.4.1 Installation Guide and see Administration Privileges.
Note - This procedure assumes that smsconfig -m has already been run. If smsconfig -m has not been run, you will receive the following error when SMS attempts to start and SMS will exit. |
2. Log in to the SC and verify that SMS software startup has completed. Type:
3. Wait until showplatform finishes displaying platform status.
At this point you can begin using SMS programs.
An SMS console window provides a command-line interface from the SC to the Solaris operating environment on the domain(s).
1. Log in to the SC, if you have not already done so.
Note - You must have domain privileges for the domain on which you wish to run console. |
console creates a remote connection to the domain's virtual console driver, making the window in which the command is executed a "console window" for the specified domain (domain_id or domain_tag).
If console is invoked without any options when no other console windows are running for that domain, it comes up in exclusive "locked write" mode session.
If console is invoked without any options when one or more non-exclusive console windows are running for that domain, it will come up in "read-only" mode.
Locked write permission is more secure. It can only be taken away if another console is opened using console -f or if ~*(tilde-asterisk) is entered from another running console window. In both cases, the new console session is an "exclusive session", and all other sessions are forcibly detached from the domain virtual console.
console can utilize either Input Output Static Random Access Memory (IOSRAM) or the internal management network for domain console communication. You can manually toggle the communication path by using the ~= (tilde-equal sign) command. Doing so is useful if the network becomes inoperable, in which case the console sessions appears to be hung.
Many console sessions can be attached simultaneously to a domain, but only one console will have write permissions; all others will have read-only permissions. Write permissions are in either "locked" or "unlocked" mode.
In a domain console window, a tilde ( ~ ) that appears as the first character of a line is interpreted as an escape signal that directs console to perform some special action, as follows:
rlogin also processes tilde-escape sequences whenever a tilde is seen at the beginning of a new line. If you need to send a tilde sequence at the beginning of a line and you are connected using rlogin, use two tildes (the first escapes the second for rlogin). Alternatively, do not enter a tilde at the beginning of a line when running inside of an rlogin window.
If you use a kill -9 command to terminate a console session, the window or terminal in which the console command was executed goes into raw mode, and appears hung. Type CTRL-j, then stty sane, then CTRL-j to escape this condition,
In the domain console window, vi(1) runs properly and the escape sequences (tilde commands) work as intended only if the environment variable TERM has the same setting as that of the console window.
If you need to resize the window, type:
For more information on domain console, see Domain Console and refer to the console man page.
In the event that a system controller hangs and that console cannot be reached directly, SMS provides the smsconnectsc command to remotely connect to the hung SC. This command works from either the main or spare SC. For more information and examples, refer to the smsconnectsc man page.
Your other option is to connect to the hung SC using an external console connection but you cannot run smsconnectsc and use an external console at the same time.
Sun Management Center for Sun Fire high-end systems is an extensible monitoring and management tool that provides a system administrator with the ability to manage the Sun Fire high-end system. Sun Management Center integrates standard SNMP based management structures with new intelligent and autonomous agent and management technology based on the client/ server paradigm.
Sun Management Center is used as the GUI and SNMP manager/agent infrastructure for the Sun Fire system. The features and functions of Sun Management Center are not covered in this manual. For more information, refer to the latest Sun Management Center documentation available at www.docs.sun.com.
Copyright © 2004, Sun Microsystems, Inc. All rights reserved.