C H A P T E R  1

Sun Fire E25K/E20K Systems Introduction

This chapter provides the following introductory information for the Sun Fire E25K/E20K systems:

The Sun Fire E25K/E20K systems use the latest UltraSPARC® IV Cu CPU and the Sun Fireplane interconnect architecture running the binary-compatible Solaristrademark UNIX® operating system (FIGURE 1-1). The industry-leading dynamic system domain and reliability, availability, and serviceability (RAS) capabilities have been applied and use the active-centerplane technology.


FIGURE 1-1 Sun Fire E25K/E20K Systems

Figure showing component similarities between the Sun Fire 15K system and the Sun Fire 12K systems. [ D ]


The Sun Fire E25K system and Sun Fire E20K system are essentially the same. The Sun Fire E25K system has the capacity for 18 CPU/Memory boards and 18 I/O assemblies. The Sun Fire E20K system has the capacity for nine CPU/Memory boards and nine I/O assemblies. Each system contains two System Control boards (one main and one spare).


1.1 System Boards

1.1.1 CPU/Memory Boards

The CPU/Memory board holds four CPUs. Each CPU has an associated memory subsystem of eight DIMMs, so memory bandwidth and capacity are both scaled up as CPUs are added. The memory capacity of the board is 64 Gbytes using 2-Gbyte DIMMs. The maximum memory bandwidth inside a board is 9.6 Gbyte/sec. The CPU/Memory board has a 4.8 Gbyte/sec connection to the rest of the system.

1.1.2 I/O Assemblies

The Sun Fire E25K/E20K hot-swap PCI assembly architecture (hsPCI-X/hsPCI+) has two I/O controllers. Each controller provides one 33-MHz peripheral component interconnect (PCI) bus and three 33/66/90 MHz PCI buses for a total of four on each I/O assembly. Therefore, each I/O assembly has four hot-swap component PCI slots. A Sun Fire I/O assembly has a 2.4 Gbyte/sec connection to the rest of the system.

1.1.3 System Controller

The system controller is the heart of the Sun Fire E25K/E20K systems availability and serviceability technology. It configures the system; coordinates the boot process; sets up the dynamic system domains; monitors the system environmental sensors; and handles error detection, diagnosis, and recovery. Two System Control boards are configured into the system to provide redundancy and automatic failover in the event that one board fails.

1.1.4 Peripherals

The Sun Fire E25K/E20K cabinet does not have room for peripherals, with the exception of the system controller peripherals (DVD-ROM, DAT drive, and hard drive). However, more peripheral devices can be configured in additional peripheral expansion racks.


1.2 System Configuration

TABLE 1-1 summarizes the maximum configuration of the Sun Fire E25K/E20K systems.


TABLE 1-1 Sun Fire E25K/E20K System Maximum Configuration

Component

E25K Configuration
E20K Configuration

CPU/Memory boards

18

9

CPUs

72

36

Number of DIMMs

576

288

Memory capacity (with 2-Gbyte DIMMs)

1152 GB

576 GB

Sun Fireplane interconnect

Active

Active

Repeater boards

NA

NA

Expander boards

18

9

Domains

18

9

I/O boards (assemblies)

18

9

PCI assembly types

hsPCI+

hsPCI+

PCI assembly types

hsPCI-X

hsPCI-X

PCI slots per assembly

4

4

Maximum PCI slots

72

36

Bulk power supplies

6

6

Power requirements

24 kW

24 kW

System Control boards

2

2

Redundant cooling

Yes

Yes

Redundant AC input

Yes

Yes

Enclosure

Sun Fire E25K/E20K Systems cabinet

Sun Fire E25K/E20K Systems cabinet

Room in enclosure for peripherals

No

No



1.3 System Interconnects

TABLE 1-2 summarizes the interconnect capacities of the Sun Fire E25K/E20K systems.


TABLE 1-2 Sun Fire E25K/E20K Systems Interconnect Specifications

Interconnect

Specification

System clock

150 MHz

Coherency protocol

Snooping on each board set,
directory across a centerplane

System address interconnect

18 snoopy buses,
18x18 global address crossbar,
18x18 global response crossbar,

CPU/Memory board internal bisection bandwidth

4.8 Gbyte/sec

CPU/Memory board
off-board data port

4.8 Gbyte/sec

I/O board
off-board data port

2.4 Gbyte/sec

System data interconnect

18 3x3 board set crossbars,
18x18 global crossbar

System bisection bandwidth

43 Gbyte/sec

Average lmbench (back-to-back-load) latency assumes random accesses

326 ns




Note - Snooping, is defined as follows in the PCI System Architecture, Third
Edition, Appendix A: Glossary, 1995, by MindShare, Inc., (ISBN 0-201-40993-3):

Snooping - When a memory access is performed by an agent other than the
cache controller, the cache controller must snoop the transaction to
determine if the current master is accessing information that is also
resident within the cache. If a snoop hit occurs, the cache controller
must take an appropriate action to ensure the continued consistency
of its cached information.



1.3.1 Sun Fireplane Interconnect Architecture

The Sun Fire E25K/E20K systems use the Sun Fireplane interconnect system- interconnect architecture that is the coherent shared-memory protocol used by the UltraSPARC IV Cu CPU generation. This is the fourth generation of shared-memory interconnect. Sun Microsystems has implemented an improved system interconnect with each new CPU generation to keep system performance scaling with CPU performance.

The Sun Fireplane interconnect architecture is an evolutionary improvement over the previous-generation Ultra Port Architecture (UPA). The system clock rate is increased by 50% from 100 MHz to 150 MHz. The snoops per clocks are doubled from one half to one. Taken together, this triples the snooping bandwidth to 150 million addresses per second.

The Sun Fireplane interconnect architecture also adds a new layer of point-to-point directory-coherency protocol. This protocol is used in systems that require more bandwidth than a single snoopy bus can provide, enabling coherency to be maintained between multiple snoopy buses.


FIGURE 1-2shows the Sun Fireplane interconnect architecture of the Sun Fire E25K system. The board diagrams show the actual on-board connectivity, but omit the switch and controller chips for clarity.FIGURE 1-2 Sun Fireplane Interconnects

Diagram showing the CPU/Memory boards and I/O boards connecting to the expander boards.[ D ]


The Sun Fire E25K/E20K systems use an expander board to implement a 3x3 switch between a CPU/Memory board, an I/O assembly, and the Sun Fireplane interconnect port. The Sun Fire E25K/E20K systems have three 18x18 crossbars on their Sun Fireplane interconnect for addresses, responses, and data so that address traffic does not interfere with data traffic. The peak Sun Fire E25K/E20K systems Sun Fireplane interconnect bandwidth is 43 Gbyte/sec.

1.3.2 Address Interconnect

The dashed lines in FIGURE 1-2 are the snoopy address buses. A snoop can occur at every system clock. In the Sun Fire E25K/E20K systems, there is a separate snoopy address bus on each board set. A board set is the combination of a CPU/Memory board, an I/O assembly, and an expander board. Coherency is maintained between board sets by using the point-to-point (directory) portion of the coherency protocol.

1.3.3 Data Interconnect

The solid lines in FIGURE 1-2 represent the data paths. The small circles at the intersections of these lines indicate three-port switches. The CPU/Memory board has three levels of 3x3 switches between a CPU or memory unit and the off-board port. The off-board bandwidth of a CPU/Memory board is 4.8 Gbyte/sec. The bandwidth of an I/O assembly is 2.4 Gbyte/sec.


1.4 Dynamic System Domains

Each domain in the Sun Fire E25K/E20K systems includes one or more CPU/Memory boards and one or more I/O assemblies. Each domain runs its own instance of the Solaris Operating System and has its own peripherals and network connections. Domains can be reconfigured without interrupting the operation of other domains. Domains can be used for:

Here is one example of partitioning a fully populated Sun Fire E25K system into three domains to handle three types of functions:

Boards can be automatically migrated between domains as the load change demands.

The Sun Fire E25K system can have up to 18 domains. The Sun Fire E20K system can have up to 9 domains. Domains are isolated from each other by the interconnect application-specific integrated circuits (ASICs).


1.5 Reliability, Availability, and Serviceability

Reliability, availability, and serviceability (RAS) are critical requirements of customers who deploy business-critical applications. The Sun Fire E25K/E20K systems build upon the industry-leading RAS capabilities. The sections that follow describe some of the major features that improve RAS.

1.5.1 Integrated Circuit Reliability

1.5.2 Interconnect Reliability

1.5.3 Fault-Tolerant Redundancy

A failure in these subsystems does not cause any loss of availability.

1.5.4 Reconfiguration After Failure

1.5.5 Serviceability