This chapter describes issues specific to Sun midrange and high-end servers. Current Sun servers are part of the Sun Fire system family. Older servers are part of the Sun Enterprise system family.
The Sun Validation Test Suite release notes are now a separate document and can be found at http://sun.com.
To see which bugs and issues are fixed and no longer apply to the Solaris 10 10/08 software, refer to Appendix A, Table of Integrated Bug Fixes in the Solaris 10 Operating System.
This section describes major domain-side DR bugs on the following Sun Fire high-end systems that run the Solaris 10 software:
Sun Fire 25K
Sun Fire 20K
Sun Fire 15K
Sun Fire 12K
For information about DR bugs on Sun Management Services, see the SMS Release Notes for the SMS version that is running on your system.
This information applies only to DR as it runs on the servers listed in this section. For information about DR on other servers, see the Release Notes or Product Notes documents or sections that describe those servers.
The following software and hardware bugs apply to Sun Fire high-end systems.
Warnings might be displayed when a DR command is executing on a system that is configured with the SunSwift PCI card, Option 1032. These warnings appear on domains that are running either the Solaris 8, Solaris 9, or Solaris 10 software. The following warning is an example:
Aug 12 12:27:41 machine genunix: WARNING: vmem_destroy('pcisch2_dvma'): leaked |
These warnings are benign. The Direct Virtual Memory Access (DVMA) space is properly refreshed during the DR operation. No true kernel memory leak occurs.
Workaround: To prevent the warning from being displayed, add the following line to /etc/system:
set pcisch:pci_preserve_iommu_tsb=0 |
The link fails between a system with a Sun GigaSwift Ethernet MMF Option X1151A and certain CISCO switches. The failure occurs when you attempt to run a DR operation on such a system that is attached to one of the following switches:
CISCO WS-c4003 switch (f/w: WS-C4003 Software, Version NmpSW: 4.4(1))
CISCO WS-c4003 switch (f/w: WS-C4003 Software, Version NmpSW: 7.1(2))
CISCO WS-c5500 switch (f/w: WS-C5500 Software, Version McpSW: 4.2(1) and NmpSW: 4.2(1))
This problem is not seen on a CISCO 6509 switch.
Workaround: Use another switch. Alternatively, you can consult Cisco for a patch for the listed switches.
This section describes major issues that are related to DR on the following Sun Fire midrange systems:
Sun Fire E6900
Sun Fire E4900
Sun Fire E6800
Sun Fire E4810
Sun Fire E4800
Sun Fire E3800
This information applies only to DR as it runs on the servers listed in this section. For information about DR on other servers, see the Release Notes or Product Notes documents or sections that describe those servers.
Table 3–1 shows acceptable combinations of Solaris software and System Controller (SC) firmware for each Sun Fire midrange system to run DR.
To best utilize the latest firmware features and bug fixes, run the most recent SC firmware on your Sun Fire midrange system. For the latest patch information, see http://sunsolve.sun.com.
Platform |
Solaris Release |
Minimum SC Firmware |
---|---|---|
Sun Fire E6900/E4900 with UltraSPARC IV+ |
Solaris 10 3/05 HW1 (a limited release) or Solaris 10 1/06 |
5.19.0 |
E6900/E4900 without UltraSPARC IV+ |
Solaris 9 4/04 |
5.16.0 |
Sun Fire 6800/4810/4800/3800 |
Solaris 9 4/04 |
5.16.0 |
Sun Fire 6800/4810/4800/3800 |
Solaris 9 |
5.13.0 |
You can upgrade the system firmware for your Sun Fire midrange system by connecting to an FTP or HTTP server where the firmware images are stored. For more information, refer to the README and Install.info files. These files are included in the firmware releases that are running on your domains. You can download Sun patches from http://sunsolve.sun.com.
This section lists important DR bugs.
If a process is holding open a network device, any DR operation that would involve that device fails. Daemons and processes that hold reference counts stop DR operations from completing.
Workaround: As superuser, perform the following steps:
Remove or rename the /rplboot directory.
Shut down NFS services.
# sh /etc/init.d/nfs.server stop |
Shut down Boot Server services.
# sh /etc/init.d/boot.server stop |
Perform the DR detach operation.
Restart NFS services.
# sh /etc/init.d/nfs.server start |
Restart Boot Server services.
# sh /etc/init.d/boot.server start |
On Sun Fire midrange systems, a CompactPCI (cPCI) I/O board cannot be unconfigured when Port 0 (P0) on that board is disabled. This problem exists in Solaris 10 and Solaris 9 software. It also exists in Solaris 8 software that has one or more of the following patches installed:
Patch ID 108528–11 through 108528–29
Patch ID 111372–02 through 111372–04
The error also occurs only during DR operations that involve cPCI boards. An error message similar to the following example is displayed:
# cfgadm -c unconfigure NO.IB7 cfgadm: Hardware specific failure: unconfigure N0.IB7: Device busy:/ssm@0,0/pci@1b,700000/pci@1 |
NO.IB7 is a CompactPCI I/O Board with P0 disabled.
Workaround: Disable the slots instead of Port 0.
This section describes issues that involve the following features on the Sun Enterprise 10000 server:
System Service Processor requirement
Dynamic reconfiguration (DR)
InterDomain Networks (IDNs)
Solaris Operating System on Sun Enterprise 10000 domains
The Solaris 10 software can be run on individual domains within a Sun Enterprise 10000 system. However, the Sun Enterprise 10000 System Service Processor is not supported by this release.
The SSP 3.5 software is required on your System Service Processor (SSP) to support the Solaris 10 software. Install the SSP 3.5 on your SSP first. Then you can install or upgrade to the Solaris 10 OS on a Sun Enterprise 10000 domain.
The SSP 3.5 software is also required so that the domain can be properly configured for DR Model 3.0.
This section describes different issues that involve dynamic reconfiguration on Sun Enterprise 10000 domains.
You must use DR 3.0 on Sun Enterprise 10000 domains that run the Solaris OS beginning with the Solaris 9 12/03 release. DR model 3.0 refers to the functionality that uses the following commands on the SSP to perform domain DR operations:
addboard
moveboard
deleteboard
showdevices
rcfgadm
You can run the cfgadm command on domains to obtain board status information. DR model 3.0 also interfaces with the Reconfiguration Coordination Manager (RCM) to coordinate the DR operations with other applications that are running on a domain.
For details about DR model 3.0, refer to the Sun Enterprise 10000 Dynamic Reconfiguration User Guide.
For this Solaris release, DR no longer automatically unbinds user processes from CPUs that are being detached. You must perform this operation before initiating a detach sequence. The drain operation fails if CPUs are found with bound processes.
If a process is holding open a network device, any DR operation that would involve that device fails. Daemons and processes that hold reference counts stop DR operations from completing.
Workaround: As superuser, perform the following steps:
Remove or rename the /rplboot directory.
Shut down NFS services.
# sh /etc/init.d/nfs.server stop |
Shut down Boot Server services.
# sh /etc/init.d/boot.server stop |
Perform the DR detach operation.
Restart NFS services.
# sh /etc/init.d/nfs.server start |
Restart Boot Server services.
# sh /etc/init.d/boot.server start |
For a domain to become part of an InterDomain Network, all boards with active memory in that domain must have at least one active CPU.
Before you issue the boot net command from the OpenBoot PROM prompt (OK), verify that the local-mac-address? variable is set to false. This setting is the factory default setting. If the variable is set to true, you must ensure that this value is an appropriate local configuration.
A local-mac-address? that is set to true might prevent the domain from successfully booting over the network.
In a netcon window, you can use the following command at the OpenBoot PROM prompt to display the values of the OpenBoot PROM variables:
OK printenv |
To reset the local-mac-address? variable to the default setting. use the setenv command:
OK setenv local-mac-address? false |
This section contains the latest information about dynamic reconfiguration (DR) functionality for the following midrange servers that are running the Solaris 10 software:
Sun Enterprise 6x00
Sun Enterprise 5x00
Sun Enterprise 4x00
Sun Enterprise 3x00
For more information about Sun Enterprise Server Dynamic Reconfiguration, refer to the Dynamic Reconfiguration User's Guide for Sun Enterprise 3x00/4x00/5x00/6x00 Systems. The Solaris 10 release includes support for all CPU/memory boards and most I/O boards in the systems that are mentioned in the preceding list.
Before proceeding, make sure that the system supports dynamic reconfiguration. If your system is of an older design, the following message appears on your console or in your console logs. Such a system is not suitable for dynamic reconfiguration.
Hot Plug not supported in this system |
The following I/O boards are not currently supported:
Type 2 (graphics)
Type 3 (PCI)
Type 5 (graphics and SOC+)
This section provides general software information about DR.
To enable dynamic reconfiguration, you must set two variables in the /etc/system file. You must also set an additional variable to enable the removal of CPU/memory boards. Perform the following steps:
Log in as superuser.
Edit the /etc/system file by adding the following lines:
set pln:pln_enable_detach_suspend=1 set soc:soc_enable_detach_suspend=1 |
To enable the removal of a CPU/memory board, add this line to the file:
set kernel_cage_enable=1 |
Setting this variable enables the memory unconfiguration operation.
Reboot the system to apply the changes.
You start the quiesce test with the following command:
# cfgadm -x quiesce-test sysctr10:slot number |
On a large system, the quiesce test might run for up to a minute. During this time no messages are displayed if cfgadm does not find incompatible drivers.
Attempting to connect a board that is on the disabled board list might produce an error message:
# cfgadm -c connect sysctrl0:slotnumber cfgadm: Hardware specific failure: connect failed: board is disabled: must override with [-f][-o enable-at-boot] |
To override the disabled condition, two options are available:
Using the force flag (-f)
# cfgadm -f -c connect sysctrl0:slot number |
Using the enable option (-o enable-at-boot)
# cfgadm -o enable-at-boot -c connect sysctrl0:slot number |
To remove all boards from the disabled board list, choose one of two options depending on the prompt from which you issue the command:
From the superuser prompt, type:
# eeprom disabled-board-list= |
From the OpenBoot PROM prompt, type:
OK set-default disabled-board-list |
For further information about the disabled-board-list setting, refer to the “Specific NVRAM Variables” section in the Platform Notes: Sun Enterprise 3x00, 4x00, 5x00, and 6x00 Systems manual. This manual is part of the documentation set in this release.
Information about the OpenBoot PROM disabled-memory-list setting is published in this release. See “Specific NVRAM Variables” in the Platform Notes: Sun Enterprise 3x00, 4x00, 5x00, and 6x00 Systems in the Solaris on Sun Hardware documentation.
If you need to unload detach-unsafe drivers, use the modinfo line command to find the module IDs of the drivers. You can then use the module IDs in the modunload command to unload detach-unsafe drivers.
Remove the board from the system as soon as possible if the following error message is displayed during a DR connect sequence:
cfgadm: Hardware specific failure: connect failed: firmware operation error |
The board has failed self-test, and removing the board avoids possible reconfiguration errors that can occur during the next reboot.
The failed self-test status does not allow further operations. Therefore, if you want to retry the failed operation immediately, you must first remove and then reinsert the board.
The following list is subject to change at any time.
If a process is holding open a network device, any DR operation that would involve that device fails. Daemons and processes that hold reference counts stop DR operations from completing.
Workaround: As superuser, perform the following steps:
Remove or rename the /rplboot directory.
Shut down NFS services.
# sh /etc/init.d/nfs.server stop |
Shut down Boot Server services.
# sh /etc/init.d/boot.server stop |
Perform the DR detach operation.
Restart NFS services.
# sh /etc/init.d/nfs.server start |
Restart Boot Server services.
# sh /etc/init.d/boot.server start |
If a cfgadm process is running on one board, an attempt to simultaneously disconnect a second board fails. The following error message is displayed:
cfgadm: Hardware specific failure: disconnect failed: nexus error during detach:address |
Workaround: Run only one cfgadm operation at a time. Allow a cfgadm operation that is running on one board to finish before you start a cfgadm disconnect operation on a second board.