Sun SPARC Enterprise M8000/M9000 Servers Product Notes

This document includes these sections:


New in XCP 1070

In XCP Version 1070, the following new feature is introduced:


Supported Firmware and Software Versions

TABLE 1 lists the minimum required versions of some supported software and firmware for XCP 1070 on Sun SPARC® Enterprise M8000/M9000 servers.


TABLE 1 Minimum Software and Firmware Versions

Software or Firmware

Version

XSCF Control Package

SPARC64 VII processors:

Capacity on Demand (COD) support:

 

XCP 1070

XCP 1050

Solaris Operating System

SPARC64 VI processors:

SPARC64 VII processors:

 

Solaris 10 11/06, with required patches[1]

Solaris 10 5/08


TABLE 2 lists minimum supported versions of Web browsers for use with the XSCF Web.


TABLE 2 Minimum Web Browser Versions

Web Browser Application

Version

Firefox

2.0

Microsoft Internet Explorer

6.0

Mozilla

1.7

Netscape Navigator

7.1


Using a WAN Boot Server

If you plan to boot your Sun SPARC Enterprise M8000/M9000 server from a Solaris WAN boot server on the network, you must have the appropriate wanboot executable intalled to provide the needed hardware support. See Booting From a WAN Boot Server for details.


Solaris Patch Information

Currently, patches are required only for servers running Solaris 10 11/06 OS. The following patches are required:

These patch identifiers represent the minimum level of the patches that must be installed. The two-digit suffix represents the minimum revision level of the patch. Check SunSolve.Sun.COM for the latest patch revision, and see Latest Solaris Patches for information on how to find the latest patches.

Installing the Solaris Patches

single-step bullet  Install the following patches in numerical order.

Always refer to the patch README for information about patch requirements and special installation instructions. For general installation instructions, refer to Latest Solaris Patches.

1. 118833-36 - Reboot your domain before proceeding.

2. 125100-10 - See the patch README file for a list of other patch requirements.

3. 123839-07

4. 120068-03

5. 125424-01

6. 118918-24

7. 120222-21

8. 125127-01 - Reboot your domain before proceeding.

9. 125670-02

10. 125166-05


Upgrading to XCP 1070

If you are upgrading to XCP 1070 from a version of XCP 1041 or lower, refer to Upgrading From XCP 1041 or Lower for important instructions.

If you are upgrading from a more recent version of XCP, refer to the Sun SPARC Enterprise M4000/M5000/M8000/M9000 Servers XSCF User’s Guide for instructions.


General Functionality Issues and Limitations

This section describes known hardware and software issues in this release.

Limitations for SPARC64 VII Processors



caution icon Caution - Upgrading a SPARC Enterprise M8000/M9000 server with SPARC 64 VII processors must be completed via cold-swap. XCP software must be upgraded to 1070 prior to inserting any SPARC 64 VII processors into the chassis.


General Functionality Issues and Limitations



caution icon Caution - For dynamic reconfiguration (DR) and hot-plug issues, see Solaris OS Issues and Workarounds.




Note - For power-on after power-off, wait at least 30 seconds before turning the system power back on, by using the main line switch or the circuit breakers on the distribution panel.



Hardware Installation and Service Issues

TABLE 3 lists known issues for which a defect change request ID has been assigned. The table also lists possible workarounds. To check for availability of new patches that fix these issues, go to:

http://sunsolve.sun.com


TABLE 3 Hardware Issues and Workarounds

CR ID

Description

Workaround

6433420

The domain console might display a Mailbox timeout or IOCB interrupt timeout error during boot.

Issue a reset-all command from the OBP (OK) prompt and reboot.



Software and Firmware Issues

This section describes specific software and firmware issues and workarounds. To obtain patches and to check for availability of new patches that fix these issues, go to:

http://sunsolve.sun.com

XCP Issues and Workarounds

TABLE 4 lists XCP issues and possible workarounds.


TABLE 4 XCP Issues and Workarounds

ID

Description

Workaround

6565422

The Latest communication field in showarchiving is not updated regularly.

Disabling and re-enabling archiving refreshes the Latest communication field in showarchiving output.

6575425

Most XSCF commands should display “Permission denied” when they are executed on the Standby XSCF. Instead, some commands report various errors.

Only the following commands can be executed on the Standby XSCF: snapshot, switchscf

Do not attempt to run any other command on the Standby XSCF.

6588650

On occasion, the system is unable to DR after an XSCF failover to or from backup XSCF.

There is no workaround.

6624646

Sun Connection Update Manager GUI might fail to register correctly.

Use the command-line interface (CLI) if you run into any GUI registration issues.

6665174

Following a dynamic reconfiguration operation using the XSCF commands deleteboard(8) and addboard(8), you might see I/O channel degradation, resulting in error messages and entries in the corresponding ereport.

If you run into this problem, the fmdump(8) command will show a report:

ereport.chassis.SPARCEnterprise.asic.ioc.ch.leaf.fe

This error can be cleared. You can clear the errors using the following commands.

  • To identify the resource, use:
fmadm faulty -ia 
  • To clear the resource, run the following command using the uuid resource identified from the previous command:
fmadm repair resource 

6675409

If COD licensed capacity is changed while a COD board is undergoing DR, some of the COD CPUs might be marked as Faulted.

This will require a service action to correct.

Do not attempt to modify the COD licensed capacity while a DR operation is in progress on a COD board.

COD licensed capacity is modified by adding or removing licenses (with the addcodlicense or deletecodlicense commands) or by changing headroom (with the setcod command). Do not use these commands (or equivalent browser operations) while a DR operation is in progress. You can change the COD licensed capacity after the DR operation is completed.

6679286

When you use the command setsnmpusm passwd to set a password, if you set a password of fewer than eight characters, a segmentation fault occurs.

Always set a password of at least eight characters.


Solaris OS Issues and Workarounds

This section contains information about Solaris OS issues. TABLE 5, TABLE 6, and TABLE 7 list issues you might encounter, depending upon which Solaris OS release you are using.

Solaris Issues for All Supported Releases

TABLE 5 lists Solaris OS issues that you might encounter in any supported release of Solaris OS.


TABLE 5 Solaris OS Issues and Workarounds for All Supported Releases

CR ID

Description

Workaround

6449315

The Solaris cfgadm(1M) command does not unconfigure a DVD drive from a domain on a Sun SPARC Enterprise M8000/M9000 server.

 

Disable the Volume Management Daemon (vold) before unconfiguring a DVD drive with the cfgadm(1M) command. To disable vold, stop the daemon by issuing the command /etc/init.d/volmgt stop. After the device has been removed or inserted, restart the daemon by issuing the command /etc/init.d/volmgt start.

6459540

The DAT72 internal tape drive might time out during tape operations.

The device might also be identified by the system as a QIC drive.

Add the following definition to /kernel/drv/st.conf:

 

tape-config-list=

"SEAGATE DAT DAT72-000",

"SEAGATE_DAT____DAT72-000",

"SEAGATE_DAT____DAT72-000";

SEAGATE_DAT____DAT72-000=1,0x34,0,0x9639,4,0x00,0x8c,0x8c,

0x8c,3;

 

There are four spaces between SEAGATE DAT and DAT72-000.

6511374

Memory translation warning messages might appear during boot if memory banks were disabled due to excessive errors.

After the system is rebooted, the fmadm repair command can be used to prevent a recurrence of the problem on the next boot.

6522017

Domains using the ZFS file system cannot use DR.

Set the maximum size of the ZFS ARC lower. For detailed assistance, contact your authorized service representative.

6531036

The error message network initialization failed appears repeatedly after a boot net installation.

There is no workaround.

6533686

When XSCF is low on system resources, DR deleteboard or moveboard operations that relocate permanent memory might fail with one or more of these errors:

SCF busy 
DR parallel copy timeout

This applies only to Quad-XSB configured System Boards hosting multiple domains.

Retry the DR operation at a later time.

6535018

In Solaris domains that include SPARC64 VII processors, workloads that make heavy use of the Solaris kernel might not scale as expected when you increase the thread count to a value greater than 256.

For Solaris domains that include SPARC64 VII processors, limit domains to a maximum of 256 threads.

6564332

Hot-plug operations on Sun Crypto Accelerator (SCA)6000 cards can cause Sun SPARC Enterprise M8000/M9000 servers to panic or hang.

 

Version 1.0 of the SCA6000 driver does not support hot-plug and should not be attempted. Version 1.1 of the SCA6000 driver and firmware supports hot-plug operations after the required bootstrap firmware upgrade has been performed.

6572827

On Sun SPARC Enterprise M8000/M9000 platforms, one of the columns in the IO Devices section of the output from prtdiag -v is "Type". This reports "PCIe", "PCIx", "PCI" or "UNKN" for each device. The algorithm used to compute this value is incorrect. It reports "PCI" for PCI-X leaf devices and "UNKN" for legacy PCI devices.

There is no workaround.

6588555

Permanent memory DR operation during XSCF failover might cause domain panic.

 

Do not start an XSCF failover while a DR operation is running. Wait for a DR operation to finish before starting the failover. If you start the failover first, wait for the failover to finish before starting the DR operation.

6589644

When XSCF switchover happens after the SB has been added using the addboard command, the console is no longer available.

There is no workaround.

 

6589833

The DR addboard command might cause a system hang if you are adding a Sun StorageTek Enterprise Class 4Gb Dual-Port Fibre Channel PCI-E HBA card (SG-XPCIE2FC-QF4) at the same time that an SAP process is attempting to access storage devices attached to this card. The chance of a system hang is increased if the following cards are used for heavy network traffic:

  • X4447A-Z, PCI-e Quad-port Gigabit Ethernet Adapter UTP
  • X1027A-Z1, PCI-e Dual 10 Gigabit Ethernet Fiber XFP Low profile Adapter

There is no workaround.

6592302

Unsuccessful DR operation leaves memory partially configured.

It might be possible to recover by adding the board back to the domain with an addboard -d command.

6614737

The DR deleteboard(8) and moveboard(8) operations might hang if any of the following conditions exist:

A DIMM has been degraded.

The domain contains system boards with different memory size.

Avoid performing DR operations if any of the following conditions exist:

  • Degraded memory - To determine whether the system contains degraded memory, use the XSCF command showstatus. For sample output see Identifying Degraded Memory in a System.
  • Differing memory sizes - To determine whether the domain contains system boards with different memory sizes, display the list of memory sizes using the XSCF command showdevices or the prtdiag command on the domain. For sample output, see Identifying Different Memory Sizes in a System Board.

If a DR command hangs, reboot the domain to recover.

6619224

For Solaris domains that include SPARC 64 VII processors, a single domain of 256 threads or more might hang for an extended period of time under certain unusual situations. Upon recovery, the uptime command will show extremely high load averages.

For Solaris domains that include SPARC 64 VII processors, do not exceed a domain size of 256 virtual processors in a single Solaris domain. This means a maximum of 32 CPUs in a single domain configuration (maximum configuration for an M8000 server).

6623226

The Solaris command lockstat(1M) might cause a system panic.

Do not use the Solaris lockstat(1M) command.

6625734

Systems with large number of processors in a single domain environment might have suboptimal performance with certain workloads.

Use processor sets to bind application processes or LWPs to groups of processors. Refer to the psrset(1M) man page for more information.

6632549

fmd service on domain might fail to maintenance mode after DR operations.

If fmd service fails, issue the following commands on the domain to recover:

# svcadm clear fmdt

6660168

If a ubc.piowbeue-cpu error occurs on a domain, the Solaris Fault Management cpumem-diagnosis module might fail, causing an interruption in FMA service.

If this happens, you will see output similar to the following sample in the console log:

  • Manually restart fmd using the command:
svcadm clear fmd
  • Or, restart the cpumem-diagnosis:
fmadm restart cpumem-diagnosis

 

SUNW-MSG-ID: FMD-8000-2K, TYPE: Defect, VER: 1, SEVERITY: Minor
EVENT-TIME: Fri Apr  4 21:41:57 PDT 2008
PLATFORM: SUNW,SPARC-Enterprise, CSN: 2020642002, HOSTNAME: <hostname>
SOURCE: fmd-self-diagnosis, REV: 1.0
EVENT-ID: 6b2e15d7-aa65-6bcc-bcb1-cb03a7dd77e3
DESC: A Solaris Fault Manager component has experienced an error that required 
the module to be disabled.  Refer to http://sun.com/msg/FMD-8000-2K for more 
information.
AUTO-RESPONSE: The module has been disabled.  Events destined for the module 
will be saved for manual diagnosis.
IMPACT: Automated diagnosis and response for subsequent events associated with 
this module will not occur.
REC-ACTION: Use fmdump -v -u <EVENT-ID> to locate the module.  Use fmadm reset 
<module> to reset the module. 

6660197

DR might cause the domain to hang if either of the following conditions exist:

  • A domain contains 256 or more CPUs.
  • More than 256 memory errors are detected.

Follow these steps:

1. Set the following parameter in the system specification file (/etc/system):

set drmach:drmach_disable_mcopy=1 

2. Reboot the domain.

6663570

DR operations involving the lowest number CPU might cause the domain to panic.

Do not use DR to remove the system board that hosts the CPU with the lowest CPU ID. Use the Solaris prtdiag command to identify the CPU with the lowest CPU ID.

6668237

After DIMMs are replaced, the corresponding DIMM faults are not cleared on the domain.

Use the command fmadm repair fmri|uuid to record the repair. Then you can use the command fmadm rotate to clear out any leftover events.


Solaris Issues Fixed in Solaris 10 5/08

TABLE 6 lists issues that have been fixed in Solaris 10 5/08 OS. You might encounter them in supported releases earlier than Solaris 10 5/08.


TABLE 6 Solaris OS Issues and Workarounds Fixed in Solaris 10 5/08

CR ID

Description

Workaround

5076574

A PCIe error can lead to an invalid fault diagnosis on a large M9000/M8000 domain.

Create a file /etc/fm/fmd/fmd.conf containing the following lines;

setprop client.buflim 40m

setprop client.memlim 40m

6348554

Using the cfgadm -c disconnect command on the following cards might hang the command:

  • SG-XPCIE2FC-QF4 Sun StorageTek Enterprise Class 4Gb Dual-Port Fibre Channel PCI-E HBA
  • SG-XPCIE1FC-QF4 Sun StorageTek Enterprise Class 4Gb Single-Port Fibre Channel PCI-E HBA
  • SG-XPCI2FC-QF4 Sun StorageTek Enterprise Class 4Gb Dual-Port Fibre Channel PCI-X HBA
  • SG-XPCI1FC-QF4 Sun StorageTek Enterprise Class 4Gb Single-Port Fibre Channel PCI-X HBA

Do not perform cfgadm -c disconnect operation on the affected cards.

6472153

If you create a Solaris Flash archive on a non-Sun SPARC Enterprise M8000/M9000 sun4u server and install it on a Sun SPARC Enterprise M8000/M9000 sun4u server, the console’s TTY flags will not be set correctly. This can cause the console to lose characters during stress.

Just after installing Solaris OS from a Solaris Flash archive, telnet into the Sun SPARC Enterprise M8000/M9000 server to reset the console’s TTY flags as follows:

# sttydefs -r console
# sttydefs -a console -i "9600 hupcl opost onlcr crtscts" -f "9600"

 

This procedure is required only once.

6522433

The incorrect motherboard might be identified by fmdump for cpu faults after reboot.

None at this time.

6527811

The showhardconf(8) command on the XSCF cannot display PCI card information that is installed in the External I/O Expansion Unit, if the External I/O Expansion Unit is configured using PCI hot-plug.

There is no workaround. When each PCI card in the External I/O Expansion Unit is configured using PCI hot-plug, the PCI card information is displayed correctly.

6545143

When kcage daemon is expanding the kcage area, if the user stack exists in the expanded area, its area is demapped and might cause a ptl_1 panic during the flushw handler execution.

There is no workaround.

 

6545685

If the system has detected Correctable MemoryErrors (CE) at power-on self-test (POST), the domains might incorrectly degrade 4 or 8 DIMMs.

Increase the memory patrol timeout values used via the following setting in /etc/system and reboot the system:

set mc-opl:mc_max_rewrite_loop = 20000

6546188

The system panics when running hot-plug (cfgadm) and DR operations (addboard and deleteboard)on the following cards:

  • X4447A-Z, PCI-e Quad-port Gigabit Ethernet Adapter UTP
  • X1027A-Z1, PCI-e Dual 10 Gigabit Ethernet Fiber XFP Low profile Adapter

There is no workaround.

6551356

The system panics when running hot-plug (cfgadm) to configure a previously unconfigured card. The message "WARNING: PCI Expansion ROM is not accessible" will be seen on the console shortly before the system panic. The following cards are affected by this defect:

  • X4447A-Z, PCI-e Quad-port Gigabit Ethernet Adapter UTP
  • X1027A-Z1, PCI-e Dual 10 Gigabit Ethernet Fiber XFP Low profile Adapter

Note - Do not use cfgadm -c unconfigure to disconnect the I/O card. Use cfgadm -c disconnect to completely remove the card. After waiting at least 10 seconds, the card might be configured back into the domain using the cfgadm -c configure command.

6556742

The system panics when DiskSuite cannot read the metadb during DR. This bug affects the following cards:

  • SG-XPCIE2FC-QF4, 4Gb PCI-e Dual-Port Fibre Channel HBA
  • SG-XPCIE1FC-QF4, 4Gb PCI-e Single-Port Fibre Channel HBA
  • SG-XPCI2FC-QF4, 4Gb PCI-X Dual-Port Fibre Channel HBA
  • SG-XPCI1FC-QF4, 4Gb PCI-X Single-Port Fibre Channel HBA

Panic can be avoided when a duplicated copy of the metadb is accessible via another Host Bus Adaptor.

6559504

Messages of the form nxge: NOTICE: nxge_ipp_eccue_valid_check: rd_ptr = nnn wr_ptr = nnn will be observed on the console with the following cards:

  • X4447A-Z, PCI-e Quad-port Gigabit Ethernet Adapter UTP
  • X1027A-Z1, PCI-e Dual 10 Gigabit Ethernet Fiber XFP Low profile Adapter

These messages can be safely ignored.

6563785

Hot-plug operation with the following cards might fail if a card is disconnected and then immediately reconnected:

  • SG-XPCIE2SCSIU320Z Sun StorageTek PCI-E Dual-Port Ultra320 SCSI HBA
  • SGXPCI2SCSILM320-Z Sun StorageTek PCI Dual-Port Ultra320 SCSI HBA

After disconnecting a card, wait for a few seconds before re-connecting.

6564934

Performing a DR deleteboard operation on a board which includes Permanent Memory when using the following network cards results in broken connections:

  • X4447A-Z, PCI-e Quad-port Gigabit Ethernet Adapter UTP
  • X1027A-Z1, PCI-e Dual 10 Gigabit Ethernet Fiber XFP Low profile Adapter

Reconfigure the affected network interfaces after the completion of the DR operation. For basic network configuration procedures, refer to the ifconfig man page for more information.

6568417

After a successful CPU DR deleteboard operation, the system panics when the following network interfaces are in use:

  • X4447A-Z, PCI-e Quad-port Gigabit Ethernet Adapter UTP
  • X1027A-Z1, PCI-e Dual 10 Gigabit Ethernet Fiber XFP Low profile Adapter

Add the following line to /etc/system and reboot the system:

 

set ip:ip_soft_rings_cnt=0 

6571370

Use of the following cards have been observed to cause data corruption in stress test under laboratory conditions:

  • X4447A-Z, PCI-e Quad-port Gigabit Ethernet Adapter UTP
  • X1027A-Z1, PCI-e Dual 10 Gigabit Ethernet Fiber XFP Low profile Adapter

Add the following line in /etc/system and reboot the system:

set nxge:nxge_rx_threshold_hi=0

6584984

The busstat(1M) command with -w option might cause domains to reboot.

There is no workaround. Do not use busstat(1M) command with -w option on pcmu_p.

6589546

prtdiag does not show all IO devices of the following cards:

  • SG-XPCIE2FC-EM4 Sun StorageTek Enterprise Class 4Gb Dual-Port Fibre Channel PCI-E HBA
  • SG-XPCIE1FC-EM4 Sun StorageTek Enterprise Class 4Gb Single-Port Fibre Channel PCI-E HBA

Use prtdiag -v for full output.


Solaris Issues Fixed in Solaris 10 8/07

TABLE 7 lists issues that have been fixed in Solaris 10 8/07 OS. You might encounter them in Solaris 10 11/06.



caution icon Caution - If you are running a version of Solaris earlier than Solaris 10 8/07, the system might panic or trap during a normal operation. For further information, see CR ID 6534471 in TABLE 7.


 
TABLE 7 Solaris OS Issues and Workarounds Fixed in Solaris 10 8/07

CR ID

Description

Workaround

6303418

A Sun SPARC Enterprise M9000 with a single domain and 11 or more fully populated system boards might hang under heavy stress.

Do not exceed 170 CPU threads.

 

Limit the number of CPU threads to one per CPU core by using the Solaris psradm command to disable the excess CPU threads. For example, disable all odd-numbered CPU threads.

6498283

Using the DR deleteboard command while psradm operations are running on a domain might cause a system panic.

There is no workaround.

6508432

A large number of spurious PCIe correctable errors can be recorded in the FMA error log.

To mask these errors, add the following entry to /etc/system and reboot the system:

set pcie:pcie_aer_ce_mask = 0x2001

6510861

When using the PCIe Dual-Port Ultra320 SCSI controller card (SG-(X)PCIE2SCSIU320Z), a PCIe correctable error causes a Solaris panic.

Add the following entry to /etc/system to prevent the problem:

set pcie:pcie_aer_ce_mask = 0x31c1

6520990

When a domain reboots, SCF might not be able to service other domains that share the same physical board. DR operation can exceed the default timeout period and panic can occur.

Increase the DR timeout period by setting the following statement in /etc/system and reboot your system.:

set drmach:fmem_timeout = 30

6527781

The cfgadm command fails while moving the DVD/DAT drive between two domains.

There is no workaround. To reconfigure DVD/Tape drive, execute reboot -r from the domain exhibiting the problem.

6530178

DR addboard command can hang. Once the problem is observed, further DR operations are blocked. Recovery requires reboot of the domain.

There is no workaround.

6534471

Systems might panic/trap during normal operation.

Make sure you have the correct /etc/system parameter and reboot the system:

set heaplp_use_stlb=0 

6539084

There is a low probability of a domain panic during reboot when the Sun Quad GbE UTP x8 PCIe (X4447A-Z) card is present in a domain.

A fix is available in patch 125670-01.

 

6539909

Do not use the following I/O cards for network access when you are using the boot net install command to install the Solaris OS:

  • X4447A-Z/X4447A-Z, PCIe Quad-port Gigabit Ethernet Adapter UTP
  • X1027A-Z/X1027A-Z, PCIe Dual 10 Gigabit Ethernet Fiber XFP

Use an alternative type of network card or onboard network device to install the Solaris OS via the network.

 


Sun Management Center Software Issues and Workarounds

TABLE 8 lists issues and possible workarounds for Sun Management Center software.


TABLE 8 Sun Management Center Issues and Workarounds

CR ID

Description

Workaround

6654948

When viewing the PlatAdmin System Components table, you might experience a delay of about 26 minutes before an alarm is displayed. There is no actual error, just a delay.

There is no workaround.



Software Documentation Updates

This section contains late-breaking information on the software documentation that became known after the documentation set was published.


TABLE 9 Software Documentation Updates

Document

Page Number

Change

Sun SPARC Enterprise M4000/M5000/M8000/M9000 Servers Glossary

 

The glossaries included in each of the documents supporting SPARC Enterprise M4000/M5000/M8000/M9000 servers have been removed from those documents. In their place, a separate document has been created, the SPARC Enterprise M4000/M5000/M8000/M9000 Servers Glossary.

Sun SPARC Enterprise M4000/M5000/M8000/M9000 Servers XSCF User’s Guide

Page 9-6

Section 9.2.2, “Supported Browsers.” Refer to TABLE 2 for the correct list of web browsers supported by the XSCF Web.

Sun SPARC Enterprise M4000/M5000/M8000/M9000 Servers XSCF User’s Guide

Page 2-2

Section 2.1.1, “Setup Summary by the XSCF Shell.” Add the following note:

Note - In addition to the standard default login, Sun SPARC Enterprise M4000/M5000/M8000/M9000 servers are delivered with a temporary login called admin to enable remote initial login, through a serial port. Its privileges are fixed to useradm and cannot be changed. You cannot log in as temporary admin using the standard UNIX user name and password authentication or SSH public key authentication. It has no password, and one cannot be added for it.

 

The temporary admin account is disabled after someone logs in as the default user, or after someone logged in as temporary admin has successfully added the first user with valid password and privileges.

 

If, before the default login is used, you cannot log in as temporary admin, you can determine if someone else has done so by executing the following command:

showuser -l

Sun SPARC Enterprise M4000/M5000/M8000/M9000 Servers Administration Guide

Page 8

“Logging in to the System” section. Add the following note:

Note - In addition to the standard default login, Sun SPARC Enterprise M4000/M5000/M8000/M9000 servers are delivered with a temporary login called admin to enable remote initial login, through a serial port. Its privileges are fixed to useradm and cannot be changed. You cannot log in as temporary admin using the standard UNIX user name and password authentication or SSH public key authentication. It has no password, and one cannot be added for it.

 

The temporary admin account is disabled after someone logs in as the default user, or after someone logged in as temporary admin has successfully added the first user with valid password and privileges.

 

If, before the default login is used, you cannot log in as temporary admin, you can determine if someone else has done so by executing the following command:

showuser -l

Sun SPARC Enterprise M4000/M5000/M8000/M9000 Servers Administration Guide

Page 70

“About Auditing” section. Add the following note at the end of the “Audit File Tools” section:

Note - This chapter describes how to set up archived log files. The SP Security (SUNWspec) Package gives administrators and service providers a means to view those files. To display the XSCF audit log files archived to your server, use the viewauditapp(8) and mergeaudit(8) off-platform audit file viewers.

Sun SPARC Enterprise M4000/M5000/M8000/M9000 Servers XSCF Reference Manual

adduser(8) man page

The maximum length of the user name is 31 characters. The adduser(8) man page erroneously documents a maximum user name length of 32 characters.

Sun SPARC Enterprise M4000/M5000/M8000/M9000 Servers XSCF Reference Manual

sendbreak(8) man page

The sendbreak(8) command will not work when the domain mode is set to on while the mode switch on the operator panel is set to locked. Refer to the setdomainmode(8) man page for more information.

Sun SPARC Enterprise M4000/M5000/M8000/M9000 Servers XSCF Reference Manual

viewaudit(8) man page

The viewaudit(8) man pages show incorrect output for Example 5 and Example 6.



Upgrading From XCP 1041 or Lower


procedure icon  To Prepare to Upgrade

1. Delete any routes configured on the lan#0 and lan#1 interfaces (failover interfaces).



Note - The applynetwork -n command will not run unless some network configuration has changed. Reseting the host name (sethostname) to exactly what it is will prompt the command to run.


The following example shows two routes that must be deleted.


XSCF> applynetwork -n
The following network settings will be applied:
xscf#0 hostname  :m8000-0
xscf#1 hostname  :m8000-1
DNS domain name  :sun.com
nameserver       :100.200.300.400
 
interface        :xscf#0-lan#0
status           :up
IP address       :100.200.300.77
netmask          :255.255.254.0
route            :-n 0.0.0.0 -m 0.0.0.0 -g 100.200.300.1
 
interface        :xscf#0-lan#1
status           :down
IP address       :
netmask          :
route            :
 
interface        :xscf#0-if
status           :down
IP address       :
netmask          :
 
interface        :lan#0
status           :down
IP address       :
netmask          :
route            :-n 0.0.0.0 -m 0.0.0.0 -g 100.200.300.
route            :-n 0.0.0.0 -m 0.0.0.0 -g 100.200.300.2
 
interface        :xscf#1-lan#0
status           :down
IP address       :
netmask          :
route            :
 
interface        :xscf#1-lan#1
status           :down
IP address       :
netmask          :
route            :
 
interface        :xscf#1-if
status           :down
IP address       :
netmask          :
 
interface        :lan#1
status           :down
IP address       :
netmask          :
route            :
 
The XSCF will be reset. Continue? [y|n] :n
XSCF> setroute -c del -n 0.0.0.0 -m 0.0.0.0 -g 100.200.300.2 lan#0
XSCF> setroute -c del -n 0.0.0.0 -m 0.0.0.0 -g 100.200.300.1 lan#0
XSCF> applynetwork -y

2. Configure the ISN network.

XCP 1050 or later supports dual XSCF configuration. The Inter-SCF Network provides an internal communication link between the two XSCF Units (active and standby).

If you do not explicitly set the IP addresses on the ISN network, XCP will use the following default values:


xscf#0-if: 192.168.1.1
xscf#1-if: 192.168.1.2

In case the IP address of XSCF-LAN or DSCP conflicts with the default subnet address of ISN, it is necessary to specify the IP address of ISN. The following is an example.


XSCF>setnetwork xscf#0-if -m 255.255.255.0 192.168.12.11
XSCF>setnetwork xscf#1-if -m 255.255.255.0 192.168.12.12
XSCF>applynetwork

3. Delete any accounts named admin.

Use the showuser -lu command to list all XSCF accounts. Any accounts named admin must be deleted prior to upgrading to XCP 1070. The admin account name is reserved. Use the deleteuser command to delete the account.



Note - For more information on admin accounts, see TABLE 9, Software Documentation Updates.



procedure icon  To Upgrade From XCP 1041 or Lower



Note - Do not access the XSCF units via the “Takeover IP address.”




Note - LAN connections are disconnected when the XSCF resets. Use the XSCF serial connection to simplify the XCP upgrade procedure.


1. Log in to the XSCF#0 using an account with platform administrative privileges.

2. Verify that there are no faulted or deconfigured components by using the showstatus(8) command.


XSCF> showstatus 
No failures found in System Initialization.

If any failures are listed, contact your authorized service representative before proceeding.

3. Power off all domains.


XSCF> poweroff -a 

4. Confirm that all domains are stopped:


XSCF> showlogs power

5. Move the key position on the operator panel from Locked to Service.

6. Collect an XSCF snapshot to archive the system status for future reference.


XSCF> snapshot -t user@host:directory

7. Upload the XCP 1070 upgrade image by using the command getflashimage(8).

For example:


XSCF> getflashimage http://server.domain.com/XCP1070/images/DCXCP1070.tar.gz 

The XSCF Web on XSCFU#0 can also be used to upload the XCP 1070 upgrade image. For more detailed information about using XSCF Web and the getflashimage(8) command, see the Sun SPARC Enterprise M4000/M5000/M8000/M9000 Servers XSCF User’s Guide.

8. Update the firmware by using the flashupdate(8) command.


XSCF> flashupdate -c update -m xcp -s 1070 

Specify the XCP version to be updated. In this example, it is 1070.



caution icon Caution - The flashupdatecommand will update one bank, reset the XSCF, and commence update of the second bank. Before proceeding to Step 9, you must verify that the current and reserve banks are both updated. If both banks indicate XCP revision 1070, proceed to the next step.


9. Confirm completion of the update.


XSCF> showlogs event

Confirm no abnormality happens while updating XCSF_B#0.

10. Confirm that both the current and reserve banks of XSCFU#0 display the updated XCP versions.


XSCF> version -c xcp 
 
XSCF#0 (Active )
XCP0 (Reserve): 1070
XCP1 (Current): 1070
XSCF#1 (Standby)
XCP0 (Reserve): 0000
XCP1 (Current): 0000

If the Current and Reserve banks on XSCF#0 do not indicate XCP revision 1070, contact your authorized service representative.

11. Confirm the servicetag(8) facility is enabled.

a. Check the servicetag facility status by using the showservicetag(8) command.


XSCF> showservicetag 
Disabled

b. If it is currently disabled, you must enable it.


XSCF> setservicetag -c enable
Settings will take effect the next time the XSCF is rebooted.

c. Reboot the XSCF to enable the servicetag facility.


XSCF> rebootxscf
The XSCF will be reset. Continue? [y|n] :y 

d. Wait until XSCF firmware reaches the ready state.

This can be confirmed when the READY LED of the XSCF remains lit, or the following message appears on the serial console:


XSCF Initialize complete 

12. Turn off all the server power switches for 30 seconds.

13. After 30 seconds, turn the power switches back on.

14. Wait until XSCF firmware reaches the ready state.

This can be confirmed when the READY LEDs of XSCF_B#0 and XSCF_B#1 remain lit.

15. Log on to XSCFU#0 using a serial connection or LAN connection.

16. Confirm no abnormality has occurred by using the showlogs error -v and showstatus commands.


XSCF> showlogs error -v
XSCF> showstatus

Because XSCF#1 is not yet running XCP 1070, XSCF#0 is unable to communicate with XSCF#1. Therefore, it is normal that showstatus will show that XSCF#1 has faulted.

If you encounter any hardware abnormality of the XSCF contact your authorized service representative.

17. Confirm and update the imported XCP image again.


XSCF> flashupdate -c update -m xcp -s 1070

Specify the XCP version to be updated. In this example, it is 1070. XSCF#1 will be updated, and then XSCF#0 updated, again.

When the firmware update for XSCF#0 is complete, XSCF#1 is active.

18. Log in to XSCFU#1 using a serial connection or LAN connection.

19. Confirm completion of the update by using the showlogs event command.


XSCF> showlogs event 

Confirm no abnormality is found during the update.

20. Confirm that both the current and reserve banks of XSCFU#0 display the updated XCP versions.


XSCF> version -c xcp 
 
XSCF#1 (Active )
XCP0 (Reserve): 1070
XCP1 (Current): 1070
XSCF#0 (Standby)
XCP0 (Reserve): 1070
XCP1 (Current): 1070

If the Current and Reserve banks on XSCF#0 do not indicate XCP revision 1070, contact your authorized service representative.

21. Confirm that switching over between XSCF units works properly.

a. Switch between the Active and Standby states:


XSCF> switchscf -t Standby 
The XSCF unit switch between the Active and Standby states. Continue? [y|n] :y 

b. When the READY LED on XSCFU_B#1 remains lit, log in to XSCFU#0 using a serial connection or LAN connection.

c. Confirm XSCF#1 is now the standby, and that XSCF#0 has become the active unit:


XSCF> showhardconf

d. Confirm no new errors have been recorded since the check in Step 16:


XSCF> showlogs error

e. Confirm that XSCF#1 has entered the active state:


XSCF> showlogs event
....
Feb 26 16:10:28 PST 2008      XSCF#1 entered active state from standby state

f. Confirm that no failures are found in system initialization:


XSCF> showstatus
No failures found in System Initialization.

22. Power on all domains.


XSCF> poweron -a

23. Log in to XSCFU#0 and confirm all domains start up properly.


XSCF> showlogs power

24. Check that there are no new errors.


XSCF> showlogs error

25. Move position of the key switch on the operator panel from Service to Lock.


Additional Software Procedures

This section contains instructions for accomplishing some of the workarounds mentioned earlier in this document.

Booting From a WAN Boot Server

The WAN boot installation method enables you to boot and install software over a wide area network (WAN) by using HTTP. To support booting the Sun SPARC Enterprise M8000/M9000 server from a WAN boot server, you must have the appropriate wanboot executable installed to provide the needed hardware support. If you have added SPARC64 VII processors to your server, for example, you must perform this procedure even if you have performed it previously, before the new processors were added.

For information about WAN boot servers, refer to the Solaris 10 Installation Guide: Network-Based Installations for the version of Solaris 10 OS that you are using. You can find Solaris 10 OS documentation here:

http://docs.sun.com/app/docs/prod/solaris.10


procedure icon  To Upgrade the wanboot Executable

1. Install the Solaris 10 OS on the WAN boot server.

Install the version of Solaris 10 OS that is required for your server. For information about minimum software requirements, refer to Supported Firmware and Software Versions.

2. Copy the wanboot executable from that Solaris release to the appropriate location on the install server.

For more detailed information, refer to the Solaris 10 Installation Guide: Network-Based Installations. For Solaris 10 8/07, for example, the information in the English document is here:

http://docs.sun.com/app/docs/doc/820-0177/6nbuennmi?a=view

3. Create a WAN boot miniroot from the Solaris 10 OS.

For Solaris 10 8/07, for example, the information in the English document is here:

http://docs.sun.com/app/docs/doc/820-0177/eypqx?a=view

If you do not upgrade the wanboot executable, the Sun SPARC Enterprise M8000/M9000 server will panic, with messages similar to the following:


krtld: load_exec: fail to expand cpu/$CPU
krtld: error during initial load/link phase
panic - boot: exitto64 returned from client program

Identifying Degraded Memory in a System


procedure icon  To Identify Degraded Memory in a System

single-step bullet  Log in to XSCF anf type the following command:


XSCF> showstatus

The following example identifies DIMM number 0A on Memory Board #5 has degraded memory.


XSCF> showstatus
    MBU_B Status:Normal;
        MEMB#5 Status:Normal;
*           MEM#0A Status:Degraded;

Identifying Different Memory Sizes in a System Board

To identify if the domain contains system boards with different memory sizes, you can use either of the following commands to display the list of memory sizes.:


procedure icon  To Use the showdevices Command

1. Log in to XSCF and type the following command:


XSCF> showdevices -d domain_id 

The following example shows that 00-0 has 64 Gbytes of memory while the other system boards have 16 Gbytes.


XSCF> showdevices -d 0 
 
...
 
Memory:
-------
          board   perm    base                domain  target deleted remaining
DID XSB   mem MB  mem MB  address             mem MB  XSB    mem MB  mem MB
01  00-0   63680       0  0x0000004000000000  260288
01  03-0   16384    7384  0x0000034000000000  260288
01  03-1   16384       0  0x0000030000000000  260288
01  03-2   16384       0  0x000002c000000000  260288
01  03-3   16384       0  0x0000028000000000  260288
 
...
 


procedure icon  To Use the prtdiag Command to Identify Memory Size

single-step bullet  On the domain, execute the prtdiag command.


# prtdiag

The following example displays different memory sizes.


# prtdiag
 
...
 
============================ Memory Configuration ============================
 
       Memory  Available           Memory     DIMM    # of  Mirror Interleave
LSB    Group   Size                Status     Size    DIMMs Mode Factor
---    ------  ------------------  -------    ------  ----- ------- ----------
 00    A        32768MB            okay       2048MB     16 no       8-way
 00    B        32768MB            okay       2048MB     16 no       8-way
 03    A         8192MB            okay       2048MB      4 no       2-way
 03    B         8192MB            okay       2048MB      4 no       2-way
 04    A         8192MB            okay       2048MB      4 no       2-way
 04    B         8192MB            okay       2048MB      4 no       2-way
 05    A         8192MB            okay       2048MB      4 no       2-way
 05    B         8192MB            okay       2048MB      4 no       2-way
 06    A         8192MB            okay       2048MB      4 no       2-way
 
...
 

Identifying Permanent Memory in a Target Board


procedure icon  To Identify Permanent Memory in a Target Board

1. Log in to XSCF and type the following command:


XSCF> showdevices -d domain_id

The following example shows a display of the showdevices -d command where 0 is the domain_id.


XSCF> showdevices -d 0
 
...
 
Memory:
-------
             board     perm       base                  domain   target deleted remaining
DID XSB   mem MB  mem MB  address             mem MB  XSB    mem MB  mem MB
00  00-0    8192       0  0x0000000000000000   24576
00  00-2    8192    1674  0x000003c000000000   24576
00  00-3    8192       0  0x0000034000000000   24576
 
...

The entry for column 4 perm mem MB indicates the presence of permanent memory if the value is not zero.

The example shows permanent memory on 00-2, with 1674 Mbytes.

If the board includes permanent memory, when you execute the deleteboard command or the moveboard command, the following notice is displayed:


System may be temporarily suspended, proceed? [y|n]:


1 (TableFootnote) See Solaris Patch Information for information about patches.