C H A P T E R  2

System Features

This chapter explains the following technical aspects, including features and structures.


2.1 Hardware Configuration

This section explains the hardware configuration, which includes the following items:

2.1.1 CPU

The M8000/M9000 servers use the SPARC64 VI/SPARC64 VII/SPARC64 VII+ CPU, a proprietary high-performance multi-core processor. On-chip L2 cache memory minimizes memory latency.

An instruction retry function has been implemented so that operation can be continued by retrying an instruction for which an error has been detected.

The M8000 server, M9000 server, and the M9000 server with expansion cabinet take advantage of system scalability by supporting up to 16, 32, or up to 64 CPU modules, respectively.

CPU modules running at different clock frequencies can be used in a single system. The latest CPUs can therefore be installed when improved processing performance is required.

The SPARC64 VII processor extends the 64-bit integer multiply-accumulate operation function and the hardware barrier function.

The SPARC64 VII+ processor expands the capacity of L2 cache memory to 12MB.



Note - To make maximum use of the 12MB L2 cache memory, it is necessary to use a certain type of CMU (CMU_C) and mount the CPU modules which consist entirely of the SPARC64 VII+ processors. If the CPU modules of different frequencies are mixed on CMU_C, the usable L2 cache memory is 6MB. Also, if you use other types of CMU (CMU_A or CMU_B) and mount the CPU modules which consist entirely of the SPARC64 VII+ processors, the usable L2 cache memory is to 6MB.


The type of CMU which has been mounted on the server can be confirmed by using the showhardconf command. For details of the showhardconf command, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF Reference Manual.

2.1.1.1 Mounted Processors and CPU Operational Modes

The M8000/M9000 servers can mount the SPARC64 VI processors, the SPARC64 VII processors, the SPARC64 VII+ processors, or a mix of those different types of processors. This section applies only to M8000/M9000 servers that run SPARC64 VII or SPARC64 VII+ processors.



Note - Supported firmware and Oracle Solaris OS will vary based on the processor type. For details, see the latest version of the Product Notes (for XCP version 1100 or later) for your server.


FIGURE 2-1 shows an example of a mixed configuration of SPARC64 VI and SPARC64 VII processors.

FIGURE 2-1 CPUs on CPU/Memory Board Unit (CMU) and Domain Configuration Example


Different types of processors can be mounted on a single CMU, as shown in CMU#2 and CMU#3 in FIGURE 2-1. And a single domain can be configured with different types of processors, as shown in Domain 2 in FIGURE 2-1.

An M8000/M9000 server domain runs in one of the following CPU operational modes:

All processors in the domain behave like and are treated by the Oracle Solaris OS as SPARC64 VI processors. The new capabilities of SPARC64 VII or SPARC64 VII+ processors are not available in this mode. Domains 1 and 2 in FIGURE 2-1 correspond to this mode.

All boards in the domain must contain only SPARC64 VII or SPARC64 VII+ processors. In this mode, the server utilizes the new capabilities of these processors. Domain 0 in FIGURE 2-1 corresponds to this mode.

For the settings of the CPU operational mode, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide or the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF Reference Manual.

There are restrictions on the DR operation depending on whether the Oracle Solaris OS operates in the SPARC64 VII enhanced mode or in the SPARC64 VI compatible mode. For DR operation, see the SPARC Enterprise M4000/M5000/M8000/M9000 Servers Dynamic Reconfiguration (DR) User’s Guide.



Note - If SPARC64 VI processors are intended to be added to a domain which consists only of SPARC64 VII or SPARC64 VII+ processors, we strongly suggest setting the SPARC64 VI compatible mode in advance. Refer to the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide or man pages for more information on the setdomainmode command.


2.1.2 Memory Subsystem

The memory subsystem controls memory access and cache memory. The M8000/M9000 servers use DDR-II DIMM memory.

Each CMU has thirty-two memory slots.

Also, the M8000 server, M9000 server, and M9000 server with expansion cabinet can mount up to 128, 256, or 512 DIMMs, respectively.

The memory subsystems use up to eight-way interleaving, providing higher-speed memory access.

Memory mirror mode is supported for every pair of memory buses in a CMU. This enables continued operation using the other non-defective bus if an error occurs in one bus. Memory mirror mode can be set up by the system administrator.

2.1.3 I/O Subsystem

The I/O subsystem controls data transfer between the main unit and I/O devices. The M8000/M9000 servers use PCIe as the interconnect bus for I/O devices.

Each IOU contains eight-lane (x8) PCIe slots. Also, eight-lane PCIe slots or 133-MHz 64-bit PCI-X slots can be in a mounted through an External I/O Expansion Unit.

The M8000 server, M9000 server, and the M9000 server with expansion cabinet can mount up to 32, 64, or 128 PCIe-compatible cards, respectively.

PCI Express slots or PCI-X slots can be added by mounting an External I/O Expansion Unit through a PCI Express slot.

2.1.4 System Bus

The CMU containing a CPU and memory subsystem and each component in an IOU containing an I/O subsystem are used for high-throughput data transfer between all components through a crossbar switch. The crossbar switch has duplicated bus routes. If one crossbar switch has an error, the system can be restarted to isolate the faulty switch, enabling the high-end servers to continue operation.

FIGURE 2-2 shows data transfer in the system.

FIGURE 2-2 Main Component Connections




Note - The SC is the system controller that controls CPUs and memory and handles communication with the XB.


2.1.5 System Control

System control of the M8000/M9000 servers refer to the system control contained within the XSCFU that runs the XSCF and every component controlled by the XSCF.

As long as input power is being supplied to the server, the XSCF constantly monitors the server even if all domains are powered off.

The following functions are provided to increase system availability:


2.2 Partitioning

A single M8000/M9000 server cabinet can be divided into multiple independent systems for operation. This dividing function is called partitioning.

This section describes features of partitioning and system configurations that can be implemented through partitioning.

2.2.1 Features

The individual systems resulting from partitioning can be built in the M8000/M9000 servers. These individual, divided systems are called domains. Domains are sometimes called partitions.

Partitioning enables arbitrary assignment of resources in the server. Partitioning also enables flexible domain configurations to be used according to the job load or processing amount.

An independent Oracle Solaris OS can run in a domain. Each domain is protected by hardware so that it is not affected by other domains. For example, a software-based problem, such as an OS panic, in one domain does not directly affect jobs in the other domains. Furthermore, the Oracle Solaris OS in each domain can be reset and shut down independently.

2.2.2 Domain Hardware Requirements

The basic hardware resources making up a domain are a CMU and an IOU mounted in the high-end servers or a physical system board (PSB) consisting of a CMU.

A PSB can be logically divided into one part (no division) or four parts. The physical unit configuration of each divided part of a PSB is called an extended system board (XSB).

A PSB that is logically divided into one part (no division) is called a Uni-XSB, and a PSB that is logically divided into four parts is called a Quad-XSB.

A domain can be configured with any combination of these XSBs. The XSCF is used to configure a domain and specify the PSB division type.



Note - Although a CMU with two CPUMs can be configured into Quad-XSB mode on an M8000/M9000 server, the server generates a "configuration error" message for those XSBs that do not have a CPUM and memory.


FIGURE 2-3 shows the partitioning division types.

FIGURE 2-3 Partition Division Types of Physical System Board (PSB)



2.2.3 Domain Configuration

Any XSBs in the server can be combined to configure a domain, regardless of whether the divided XSB is the Uni-XSBs or Quad-XSBs.

These XSBs can be used in any combination for a flexible domain configuration. Also, the quantity of resources for one XSB can be adjusted according to the division type of a PSB. Thus, a domain can be configured based on the quantity of resources required for job operations.

XSCF user interfaces are used to configure a domain. Each configured domain is managed by the XSCF.

The maximum number of domains that can be configured in the servers depends on the system. Up to 16 domains can be configured in M8000 servers, and up to 24 domains can be configured in M9000 servers.

To configure a domain, an LSB number must first be assigned so that a logical system board (LSB) can function as an LSB of the XSB.

This LSB number is referenced by the Oracle Solaris OS, and it must be a unique number in the domain. However, if one XSB is shared by multiple domains, a common LSB number need not be defined in the domains. An arbitrary LSB number can be assigned for this setting in each domain.

Domain configuration settings are made for each domain. A domain can be configured by specifying an XSB together with this LSB number.

Up to 16 XSBs can be configured in a single domain.

The following as well as the quantity of resources must be considered by the user who is specifying the domain configuration and division type:

In addition, resources of a configured domain can be added to and deleted from individual XSBs, and they can be moved between domains by using DR function.

FIGURE 2-4 shows the domain configuration.

FIGURE 2-4 Domain Configuration



2.3 Resource Management

This section explains the following functions that support dynamic reconfiguration of domain resources during system operation:

2.3.1 Dynamic Reconfiguration

Dynamic reconfiguration (DR) enables hardware resources on system boards to be added and removed dynamically without stopping system operation. DR thus enables optimal relocation of system resources. Also, if a failure occurs, DR can place the system in a state that enables active replacement of the faulty component.

Using the DR function enables additions or distributions of resources as required for job expansions or new jobs, and it can be used for the following purposes.

By reserving some resources, the reserved resources can be added according to changes in the work load occurring daily, monthly, or annually. This enables flexible resource allocations on the system that needs to operate 24 hours a day, every day of the year in accordance with changes in the amount of data and the work load.

If a failure occurs in a CPU for a domain that has been configured with system resources of multiple system boards, the DR function enables the faulty CPU to be isolated dynamically without stopping the system. The replacement CPU can be configured dynamically in the original domain.

For details on Dynamic Reconfiguration, see the SPARC Enterprise M4000/M5000/M8000/M9000 Servers Dynamic Reconfiguration (DR) User’s Guide.

2.3.2 PCI Hot-plug

The PCI hot-plug function enables PCI cards to be added or removed under the Oracle Solaris OS without a system reboot.

Examples of uses for the PCI hot-plug function are as follows:

For details on the PCI hot-plug function, see the SPARC Enterprise M8000/M9000 Servers Service Manual.

2.3.3 Capacity on Demand

The Capacity on Demand (COD) feature allows you to configure spare processing resources on your server in the form of one or more COD CPUs which can be activated at a later date when additional processing power is needed. To access each COD CPU, you must purchase a COD hardware activation permit. Under certain conditions, you can use COD resources before purchasing COD permits for them.

For details on COD, see the SPARC Enterprise M4000/M5000/M8000/M9000 Servers Capacity on Demand (COD) User’s Guide

2.3.4 Oracle Solaris Zones

The Oracle Solaris 10 OS has a function called Oracle Solaris Zones that divides the processing resources and allocates them to applications.

In a domain, resources can be divided into sections called containers, and the processing sections are allocated to each application. The processing resources are managed independently in each container. If a problem occurs in a container, the container can be isolated so that it does not affect other containers. It provides flexible resource allocation that enables optimal resource management with consideration given to the processing load.


2.4 RAS

RAS is an acronym for functions related to Reliability, Availability, and Serviceability.

RAS for M8000/M9000 servers minimize system downtime by providing for error checking at appropriate locations and by providing centralized monitoring and control of error checking.

Also M8000/M9000 servers can be configured with clustering software or centralized management software to enhance the RAS function.

Any scheduled system halt, such as a periodic maintenance or system configuration change can also be performed without affecting operating resources. This can improve service uptime significantly.

2.4.1 Reliability

Reliability represents the length of time the server can operate normally without failure.

Reliability is equally important to both hardware and software.

To improve quality, adequate components must be selected with consideration given to the product service life and the required response in case of a failure. In evaluations such as stress tests that check the service life, components and products are inspected to determine whether they meet the target reliability levels.

Furthermore, software errors are not only triggered by program errors, but also by hardware errors.

M8000/M9000 servers provide the following functions to realize high reliability.

2.4.2 Availability

Availability is characterized by how easily a server fails and how quickly the user can be recovered from the failure. The amount of time the system is usable is represented as a percentage.

Hardware and software faults in the system cannot be completely eliminated. To provide high availability, the system must include mechanisms that enable continuous system operation even if a failure occurs in hardware, such as components and devices, or in software, such as the OS, or application software.

M8000/M9000 servers provide the functions listed below to obtain high availability. Higher availability can also be obtained by combining the server with clustering software or management software.

2.4.3 Serviceability

Serviceability is characterized by how easily a server fault can be diagnosed, and how quickly the server can be recovered from the fault or how easily the fault can be corrected.

To achieve high serviceability rates, it must be possible to identify the causes of component or device failure. To facilitate recovery from failure, the system must determine the cause of the failure and isolate the faulty component for replacement. The system must also notify the system administrator and/or field engineer of the event and situation in an easy-to-understand format that prevents misunderstandings.

M8000/M9000 servers provide the following solution to realize high serviceability: