C H A P T E R 8 |
Solaris Operating System APIs |
This chapter introduces Solaris operating system APIs of concern to the Netra CT server, including configuration and status of the system frutree and environmental monitoring with sensor status information. This is handled through the Platform Information and Control Library (PICL) framework, gathering FRU-ID information, and dynamic reconfiguration interfaces. These subjects are addressed in:
PICL provides a method to publish platform-specific information for clients to access in a way that is not specific to the platform. The Solaris PICL framework provides information about the system configuration which it maintains in the PICL tree. Within this PICL tree is a subtree named frutree, that represents the hierarchy of system FRUs with respect to a root node in the tree called chassis. The frutree represents physical resources of the system.
The main components of the PICL framework are:
FIGURE 8-1 diagrams the PICL daemon (picld) and its connection to the PICL plug-ins, some of which are common to all platforms. These plug-ins are responsible for populating and maintaining the PICL tree during system boot and dynamic reconfiguration (DR) operations. They also handle sensor events.
Application clients use libpicl(3LIB) to interface with picld to display or set parameters using Managed Object Hierarchy (MOH), RMI, SNMP, or Solaris PICL client interfaces. MOH uses the common operating system library (COSL) interface to access picld, which in turn uses the libpicl interfaces.
Updates to the system frutree are done during DR operations, which are performed using cfgadm (1M), or during fru latch and unlatch operations.
The following section identifies the exported interfaces in the PICL tree which are basically the nodes and the properties of the node that are present in the PICL tree.
To read the PICL frutree data of the system, use the prtpicl(1M) command. The structure of the PICL frutree involves a hierarchical representation of nodes. The immediate descendants of /frutree are one or more fru nodes by the name of chassis.
FRUs and their locations are represented by nodes of classes fru and location respectively, in the PICL frutree under the /frutree node. The port node is an extension of the fru class.
The three major node classes, location, fru, and port, are summarized in TABLE 8-1. Each of these classes is populated with various properties, among which are State and Condition. More detailed information is provided in the sections following this summary table.
State of the location: empty, connected, disconnected, or unknown. |
||
Condition or operational state of the FRU: ok, failing, failed, unknown, or unusable. |
||
In addition to those properties already defined by PICL, the following property is added:
The ChassisType read-only property represents the chassis type. The value of ChassisType is currently defined as:
uname -i
There should be a configuration file of this name with the .conf extension in the /usr/platform/'uname -i'/lib/picl/plugins/ directory. If none is provided, then the frutree is not initialized.
Where the following fru class properties are writeable, permission checks govern that they be written to by a process with the user ID of root.
The State property of the fru class node represents the occupant state of the cfgadm attachment point associated with fru node. In such a case, a read operation of this property directs the plug-in to check the state of the occupant using libcfgadm to determine the latest State information.
The various state values are shown in TABLE 8-2.
The FRU is not configured and unusable. See cfgadm(1M) for details. |
|
The FRU is configured and usable. See cfgadm(1M) for details. |
The Condition property of the fru class node represents the condition of occupant of the cfgadm attachment point. The various condition values are shown in TABLE 8-3. When libcfgadm interfaces are not available, a platform must provide the same semantics using platform-specific interfaces in defining this property.
Either the FRU software such as drivers or applications, or FRU hardware, can be responsible for providing the Condition information. This information might be either polled for, or retrieved asynchronously via an event mechanism such as sysevent.
The connectivity between nodes in a telco network is established by a link that provides the physical transmission medium. A port node represents a resource of a fru that provides such a link. Examples of ports are: serial port and network port.
The port class node extends the PICL frutree definition of fru class of nodes. A port is always a child of a fru class, even if it is the only resource of the fru. There are no location or fru nodes beneath a port class node, because FRUs linked to the port class node are not managed in the domain in which port class node exists. There might be dependencies, such as when a remote device is cabled to a port node. These dependencies can influence the state of the port, but not necessarily the FRU itself.
The PICL frutree plug-in is responsible for identifying the port class nodes and creating the respective nodes in the frutree.
Port class properties consist of State and Condition, values of which are shown in the following paragraphs.
A port class node can be in one of the states shown in TABLE 8-4:
A port is down when its link state is down, that is, a carrier was not detected. |
|
A port is up when its link state is up, that is, a carrier is detected. |
|
The state of the port node is maintained by the frutree plug-in. The State value is initially determined by looking at the kstat information published by the device driver that owns the port. If the device driver information is not determined, this value remains unknown. The parent fru of the port must set its state to configured for the port to be anything other than unknown. See kstat(1M) for details.
The Condition value of a port class node carries the same meaning as the cfgadm value of the attachment point, as shown in TABLE 8-5.
Initial Condition values can be obtained by looking at the driver kstat information, if present. A device driver managing a resource of the FRU can influence the overall condition of the FRU by sending appropriate fault events. The property information is valid only when the parent fru state is configured.
This PortType property indicates the functional class of port device, as shown in TABLE 8-6.
The following properties are common to all PICL classes:
This property indicates the geographical address of the node in relation to its parent node. It should be possible to point to the physical location (slot number) of the node in its parent domain. For example, the Netra CT 810 server describes a location's GeoAddr under the chassis node as its physical slot number. This could differ from the Label information printed on the chassis itself. In this instance, the system controller slot on the Netra CT 810 system chassis is labelled as CPU, although its GeoAddr has a value of 1. Note that the Label property might not have the physical slot number embedded in it.
This property indicates when the State property was last updated. This can indicate when a FRU was last inserted or removed, configured or unconfigured, or when the port link went down. Status time is updated even for transitional state changes.
This property indicates when the Condition property was last updated. Using this property, for example, a System Management software can calculate how long a fru/port has been in operation before failure.
A temperature sensor node is in the PICL frutree under the Environment property of the fru node.
The temperature sensors are represented as PICL_CLASS_TEMPERATURE_SENSOR class in the PICL tree. A State property is declared for each temperature sensor node representing the state information as shown in TABLE 8-7.
TABLE 8-8 lists the Solaris OS man pages that document the PICL framework and API. You can view the following man pages at the command line or on the Solaris OS documentation web site (http://docs.sun.com/documentation).
For examples of use of these functions, see Programming Watchdog Timers Using the PICL API.
The Dynamic Reconfiguration (DR) interfaces allow resources to be reconfigured without user intervention when system resources are added or removed while the system is running. Traditionally, applications assume that OS resources remain static after boot. In DR situations, challenges faced by applications include the following:
The Solaris OS has knowledge of DR operations, but certain applications might not. If an application is holding the resources involved in the DR operation, the operation will fail. To be successful, applications need to be dynamically aware of the current state of the system. The Solaris DR framework includes the Reconfiguration Coordination Manager (RCM), cfgadm(1m), and libcfgadm (3LIB). It also includes the PCI hotplug/cPCI Hotswap framework (cfgadm_pci(1M)), SCSI hotplug framework (cfgadm_scsi(1M)), and the Hotswap Controller driver (cphsc(7D)).
The following sections describe:
The Reconfiguration Coordination Manager (RCM) is a generic framework which allows DR to interact with system management software. The framework enables automated DR removal operations on platforms with proper software and hardware configuration. RCM defines generic APIs to coordinate DR operations between DR initiators and DR clients during resource removal. For details on RCM, go to http://www.sun.com/documentation.
The Netra CT server supports the following three hot-swap models according to the PICMG CompactPCI Hotswap specifications version 2.1 R1.0:
These models can be described by two terms:
In the basic hot-swap model, the hardware connection process can be performed automatically by the hardware, while the software connection process requires operator assistance.
In the full hot-swap model, both the hardware and the software connection processes are performed automatically. The Netra CT server is configured for full hot swap by default. The mode of a slot can be reconfigured to basic hot swap using the cfgadm command in cases where a third-party board does not support full hot swap.
In the high-availability model, software has the capability of controlling the power-on of the FRU hardware, beyond the hardware and software connection processes. Drivers and services can isolate a board from the system until an operator is able to intervene.
The Netra CT server uses the cfgadm(1M) utility for administering the hot-swap process. This includes connecting and disconnecting, configuring and unconfiguring the hardware and software, and setting various operation modes. Elements of the b
utility are described in the next section.
On the Netra CT server, CPU card, CPU transition card, and I/O board hot swapping is supported. It should be noted that non-hotswap friendly devices can be supported only in basic hot-swap mode. See the Netra CT Server Service Manual (816-2482) for list of hot-swappable FRUs.
Configuration changes are handled in a coherent way, because DR and the Frutree management framework are integrated in PICL. PICL frutree properties and cfgadm attachment point elements are mapped one-to-one, which creates data consistency. All DR operations are coordinated with a service processor.
Configuration administration of a dynamically reconfigurable system is carried out through cfgadm(1M), which can display status, invoke configuration state changes, and invoke hardware specific functions. See the Netra CT System Administration Guide (816-2486) for more information on the cfgadm utility.
The libcfgadm(3LIB) command can be used to display a library of configuration interfaces.
Use cfgadm to perform a connect operation on a cPCI FRU, for example:
Use cfgadm to perform a disconnect operation on a cPCI FRU, for example:
Temperature sensor states can be read using the libpicl API. The properties that are supported in a PICL temperature sensor class node are listed in TABLE 8-9.
The PICL plug-in receives these sensor events and updates the State property based on the information extracted from the IPMI message. It then posts a PICL event.
The threshold levels of the PICL node class temperature-sensor are:
TABLE 8-10 lists the PICL threshold levels and their MOH equivalents.
To obtain a reading of temperature sensor states, type the prtpicl -v command:
PICL output of the temperature sensors on a Netra CT system is shown in CODE EXAMPLE 8-1.
Note - PICL clients can use the libpicl APIs to set and get various properties of this sensor. |
The Netra CT system's watchdog service captures catastrophic faults in the Solaris OS running on either a host or satellite CPU board. The watchdog service reports such faults to the alarm card by means of either an IPMI message or by a de-assertion of the CPU's HEALTHY# signal.
The Netra CT system management controller provides two watchdog timers, the watchdog level 2 (WD2) timer and the watchdog level 1 (WD1) timer. Systems management software starts and the Solaris OS periodically pats the timers before they expire. If the WD2 timer expires, the watchdog function of the WD2 timer forces the SPARC processor to optionally reset. The maximum range for WD2 is 255 seconds.
The WD1 timer is typically set to a shorter interval than the WD2 timer. User applications can examine the expiration status of the WD1 timer to get advance warning if the main timer, WD2, is about to expire. The system management software has to start WD1 before it can start WD2. If WD1 expires, then WD2 starts only if enabled. The maximum range for WD1 is 6553.5 seconds.
The watchdog subsystem is managed by a PICL plug-in module. This PICL plug-in module provides a set of PICL properties to the system, which enables a Solaris PICL client to specify the attributes of the watchdog system.
To use the PICL API to set the watchdog properties, your application must adhere to the following sequence:
1. Before setting the watchdog timer, use the PMS API to disable the primary HEALTHY# signal monitoring for the CPU board on which the watchdog timer is to be changed.
To do this, switch to the alarm card CLI and use the command pmsd infoshow, specifying the slot number. The output will indicate whether the card is in MAINTENANCE mode or OPERATIONAL mode.
If the card is in OPERATIONAL mode, switch it into MAINTENANCE mode by issuing the following command:
This disables the primary HEALTHY# signal monitoring of the board in the specified slot.
2. In your application, use the PICL API to disarm, set, and arm the active watchdog timer.
Refer to the picld(1M), libpicl(3LIB), and libpicl(3PICL) man pages for a complete description of the PICL architecture and programming interface. Develop your application using the PICL programming interface to do the following:
3. Use the PMS API to enable the primary HEALTHY# signal monitoring on the CPU card in the specified slot.
From the alarm card CLI, switch the card back to operational mode by issuing the following command:
HEALTHY# monitoring will be enabled again on the card in the slot that you specified.
Refer to Chapter 7 for information on Processor Management Services (PMS).
PICL interfaces for the watchdog plug-in module (see TABLE 8-11) include the nodes watchdog-controller and watchdog-timer.
Activates all timers under the controller with values already set for WdTimeout and WdAction. |
||
Default value set at boot time. Indicates timer is disarmed or stopped. |
||
WdTimeout[1] |
Indicates the timer initial countdown value. Should be set prior to arming the timer. |
|
WdAction[2] |
||
Send notifications to system alarm hardware by means of HEALTHY#. |
||
Perform a soft or hard reset the system (implementation specific). |
||
To identify current settings of watchdog-controller, issue the command prtpicl -v as shown in CODE EXAMPLE 8-2.
Sun FRU-ID is the container for presenting the FRU-ID data. If the Sun FRU-ID container is not present, the FRU-ID Access plug-in looks for the IPMI FRU-ID container of cPCI FRUs. It then converts FRU-ID data from IPMI format to Sun FRU-ID format and presents the result in Sun FRU-ID ManR (manufacturer record) format.
The command prtfru(1M) displays FRU data of all FRUs in the PICL frutree. When prtfru is run on the host CPU, FRU data is displayed for the host CPU and the satellite CPUs.) CODE EXAMPLE 8-3 shows an example of the output of the prtfru command.
Communication between CPUs is enabled by MCNet (mcn(7D)), which presents an Ethernet-like interface over the cPCI bus in accordance with PICMG 2.14. The interface is configured automatically during system boot, and supports all existing network tools, such as ifconfig(1M), netstat(1M) and so forth. The CPUs must be MCNet-capable in order to communicate with each another.
Copyright © 2004, Sun Microsystems, Inc. All rights reserved.