Solaris 10 5/09 Release Notes

Chapter 3 System-Specific Issues

This chapter describes issues specific to Sun midrange and high-end servers. Current Sun servers are part of the Sun Fire system family. Older servers are part of the Sun Enterprise system family.


Note –

The Sun Validation Test Suite release notes are now a separate document and can be found at http://sun.com.


Dynamic Reconfiguration on Sun Fire High-End Systems

This section describes major domain-side DR bugs on the following Sun Fire high-end systems that run the Solaris 10 software:

For information about DR bugs on Sun Management Services, see the SMS Release Notes for the SMS version that is running on your system.


Note –

This information applies only to DR as it runs on the servers listed in this section. For information about DR on other servers, see the Release Notes or Product Notes documents or sections that describe those servers.


Known Software and Hardware Bugs

The following software and hardware bugs apply to Sun Fire high-end systems.

GigaSwift Ethernet MMF Link Fails With CISCO 4003 Switch After DR Attach

The link fails between a system with a Sun GigaSwift Ethernet MMF Option X1151A and certain CISCO switches. The failure occurs when you attempt to run a DR operation on such a system that is attached to one of the following switches:

This problem is not seen on a CISCO 6509 switch.

Workaround: Use another switch. Alternatively, you can consult Cisco for a patch for the listed switches.

Dynamic Reconfiguration on Sun Fire Midrange Systems

This section describes major issues that are related to DR on the following Sun Fire midrange systems:


Note –

This information applies only to DR as it runs on the servers listed in this section. For information about DR on other servers, see the Release Notes or Product Notes documents or sections that describe those servers.


Minimum System Controller Firmware

Table 3–1 shows acceptable combinations of Solaris software and System Controller (SC) firmware for each Sun Fire midrange system to run DR.


Note –

To best utilize the latest firmware features and bug fixes, run the most recent SC firmware on your Sun Fire midrange system. For the latest patch information, see http://sunsolve.sun.com.


Table 3–1 Minimum SC Firmware for Each Platform and Solaris Release

Platform 

Solaris Release 

Minimum SC Firmware 

Sun Fire E6900/E4900 with UltraSPARC IV+ 

Solaris 10 3/05 HW1 (a limited release) or Solaris 10 1/06 

5.19.0 

E6900/E4900 without UltraSPARC IV+ 

Solaris 9 4/04 

5.16.0 

Sun Fire 6800/4810/4800/3800 

Solaris 9 4/04 

5.16.0 

Sun Fire 6800/4810/4800/3800 

Solaris 9 

5.13.0 

You can upgrade the system firmware for your Sun Fire midrange system by connecting to an FTP or HTTP server where the firmware images are stored. For more information, refer to the README and Install.info files. These files are included in the firmware releases that are running on your domains. You can download Sun patches from http://sunsolve.sun.com.

Known DR Software Bugs

This section lists important DR bugs.

Network Device Removal Fails When a Program Is Holding the Device Open (5054195)

If a process is holding open a network device, any DR operation that would involve that device fails. Daemons and processes that hold reference counts stop DR operations from completing.

Workaround: As superuser, perform the following steps:

  1. Remove or rename the /rplboot directory.

  2. Shut down NFS services.


    # sh /etc/init.d/nfs.server stop
    
  3. Shut down Boot Server services.


    # sh /etc/init.d/boot.server stop
    
  4. Perform the DR detach operation.

  5. Restart NFS services.


    # sh /etc/init.d/nfs.server start
    
  6. Restart Boot Server services.


    # sh /etc/init.d/boot.server start
    

Sun Enterprise 10000 Release Notes

This section describes issues that involve the following features on the Sun Enterprise 10000 server:


Note –

The Solaris 10 software can be run on individual domains within a Sun Enterprise 10000 system. However, the Sun Enterprise 10000 System Service Processor is not supported by this release.


System Service Processor Requirement

The SSP 3.5 software is required on your System Service Processor (SSP) to support the Solaris 10 software. Install the SSP 3.5 on your SSP first. Then you can install or upgrade to the Solaris 10 OS on a Sun Enterprise 10000 domain.

The SSP 3.5 software is also required so that the domain can be properly configured for DR Model 3.0.

Dynamic Reconfiguration Issues

This section describes different issues that involve dynamic reconfiguration on Sun Enterprise 10000 domains.

DR Model 3.0

You must use DR 3.0 on Sun Enterprise 10000 domains that run the Solaris OS beginning with the Solaris 9 12/03 release. DR model 3.0 refers to the functionality that uses the following commands on the SSP to perform domain DR operations:

You can run the cfgadm command on domains to obtain board status information. DR model 3.0 also interfaces with the Reconfiguration Coordination Manager (RCM) to coordinate the DR operations with other applications that are running on a domain.

For details about DR model 3.0, refer to the Sun Enterprise 10000 Dynamic Reconfiguration User Guide.

DR and Bound User Processes

For this Solaris release, DR no longer automatically unbinds user processes from CPUs that are being detached. You must perform this operation before initiating a detach sequence. The drain operation fails if CPUs are found with bound processes.

Network Device Removal Fails When a Program Is Holding the Device Open (5054195)

If a process is holding open a network device, any DR operation that would involve that device fails. Daemons and processes that hold reference counts stop DR operations from completing.

Workaround: As superuser, perform the following steps:

  1. Remove or rename the /rplboot directory.

  2. Shut down NFS services.


    # sh /etc/init.d/nfs.server stop
    
  3. Shut down Boot Server services.


    # sh /etc/init.d/boot.server stop
    
  4. Perform the DR detach operation.

  5. Restart NFS services.


    # sh /etc/init.d/nfs.server start
    
  6. Restart Boot Server services.


    # sh /etc/init.d/boot.server start
    

InterDomain Networks

For a domain to become part of an InterDomain Network, all boards with active memory in that domain must have at least one active CPU.

OpenBoot PROM Variables

Before you issue the boot net command from the OpenBoot PROM prompt (OK), verify that the local-mac-address? variable is set to false. This setting is the factory default setting. If the variable is set to true, you must ensure that this value is an appropriate local configuration.


Caution – Caution –

A local-mac-address? that is set to true might prevent the domain from successfully booting over the network.


In a netcon window, you can use the following command at the OpenBoot PROM prompt to display the values of the OpenBoot PROM variables:


OK printenv

To reset the local-mac-address? variable to the default setting. use the setenv command:


OK setenv local-mac-address? false

Dynamic Reconfiguration on Sun Enterprise Midrange Systems

This section contains the latest information about dynamic reconfiguration (DR) functionality for the following midrange servers that are running the Solaris 10 software:

For more information about Sun Enterprise Server Dynamic Reconfiguration, refer to the Dynamic Reconfiguration User's Guide for Sun Enterprise 3x00/4x00/5x00/6x00 Systems. The Solaris 10 release includes support for all CPU/memory boards and most I/O boards in the systems that are mentioned in the preceding list.

Supported Hardware

Before proceeding, make sure that the system supports dynamic reconfiguration. If your system is of an older design, the following message appears on your console or in your console logs. Such a system is not suitable for dynamic reconfiguration.


Hot Plug not supported in this system

The following I/O boards are not currently supported:

Software Notes

This section provides general software information about DR.

Enabling Dynamic Reconfiguration

To enable dynamic reconfiguration, you must set two variables in the /etc/system file. You must also set an additional variable to enable the removal of CPU/memory boards. Perform the following steps:

  1. Log in as superuser.

  2. Edit the /etc/system file by adding the following lines:


    set pln:pln_enable_detach_suspend=1
    set soc:soc_enable_detach_suspend=1
    
  3. To enable the removal of a CPU/memory board, add this line to the file:


    set kernel_cage_enable=1
    

    Setting this variable enables the memory unconfiguration operation.

  4. Reboot the system to apply the changes.

Quiesce Test

You start the quiesce test with the following command:


 # cfgadm -x quiesce-test sysctr10:slot number

On a large system, the quiesce test might run for up to a minute. During this time no messages are displayed if cfgadm does not find incompatible drivers.

Disabled Board List

Attempting to connect a board that is on the disabled board list might produce an error message:


# cfgadm -c connect sysctrl0:slotnumber







cfgadm: Hardware specific failure: connect failed:
board is disabled: must override with [-f][-o enable-at-boot]

To override the disabled condition, two options are available:

To remove all boards from the disabled board list, choose one of two options depending on the prompt from which you issue the command:

For further information about the disabled-board-list setting, refer to the “Specific NVRAM Variables” section in the Platform Notes: Sun Enterprise 3x00, 4x00, 5x00, and 6x00 Systems manual. This manual is part of the documentation set in this release.

Disabled Memory List

Information about the OpenBoot PROM disabled-memory-list setting is published in this release. See “Specific NVRAM Variables” in the Platform Notes: Sun Enterprise 3x00, 4x00, 5x00, and 6x00 Systems in the Solaris on Sun Hardware documentation.

Unloading Detach-Unsafe Drivers

If you need to unload detach-unsafe drivers, use the modinfo line command to find the module IDs of the drivers. You can then use the module IDs in the modunload command to unload detach-unsafe drivers.

Self-Test Failure During a Connect Sequence

Remove the board from the system as soon as possible if the following error message is displayed during a DR connect sequence:


cfgadm: Hardware specific failure: connect failed: firmware operation error

The board has failed self-test, and removing the board avoids possible reconfiguration errors that can occur during the next reboot.

The failed self-test status does not allow further operations. Therefore, if you want to retry the failed operation immediately, you must first remove and then reinsert the board.

Known Bugs

The following list is subject to change at any time.

Network Device Removal Fails When a Program Is Holding the Device Open (5054195)

If a process is holding open a network device, any DR operation that would involve that device fails. Daemons and processes that hold reference counts stop DR operations from completing.

Workaround: As superuser, perform the following steps:

  1. Remove or rename the /rplboot directory.

  2. Shut down NFS services.


    # sh /etc/init.d/nfs.server stop
    
  3. Shut down Boot Server services.


    # sh /etc/init.d/boot.server stop
    
  4. Perform the DR detach operation.

  5. Restart NFS services.


    # sh /etc/init.d/nfs.server start
    
  6. Restart Boot Server services.


    # sh /etc/init.d/boot.server start