Chapter 1. Introducing Altix UV System Control Topology

This manual describes controller software commands on SGI Altix UV 100 and SGI Altix UV 1000 systems.


Note: This manual does not apply to SGI Altix UV 10 systems. For information, see the SGI Altix UV 10 System User's Guide.


Altix UV 1000 Overview

The SGI Altix UV 1000 system is a blade-based, cache-coherent non-uniform memory access (ccNUMA), computer system that is based on the Intel Xeon 7500 series processor. The UV 1000 system scales, as follows:

  • From 32 to 2048 threads in a single system image (SSI)

  • A maximum of 2048 processor cores with hyper-threading turned off

  • A maximum of 4096 processor threads (2048 processor cores) with hyper-threading turned on


    Note: Each processor core supports two threads. A processor with hyper-threading enabled is treated by the operating system as two processors instead of one. This means that only one processor is physically present but the operating system sees two logical processors, and shares the workload between them. At initial release, the maximum SSI supported by the Linux operating system is 2048.


The main component is an 18U-high individual rack unit (IRU) shown in Figure 1-1 that supports 16 compute blades and is configurable to support multiple topology options.

The compute blades in the IRU are interconnected using NUMAlink 5 technology. NUMAlink 5 has a peak aggregate bi-directional bandwidth of 15 GB/s. Multiple IRUs are also interconnected with NUMAlink 5 technology.

A maximum of two IRUs can be placed into a custom 42U rack as shown in Figure 1-2. Each rack supports a maximum of 512 processor cores; therefore, the largest SSI system requires four racks. A maximum of 128 four rack cells can be interconnected to create a 512 rack system (256K processor cores).

Figure 1-1. Individual Rack Unit

Individual Rack Unit

Figure 1-2. Basic System Building Blocks for Altix UV 1000 Systems

Basic System Building Blocks for Altix UV 1000 Systems

The Altix UV system supports direct attach I/O on the compute blade. The compute blade is designed to host one of four different I/O riser cards. Various PCI express based I/O components are supported. Figure 1-3 shows a full SGI Altix UV system rack.

Figure 1-3. SGI Altix UV System Rack

SGI Altix UV System Rack

For a detailed hardware description, see the SGI Altix UV 1000 Systems User's Guide. Figure 1-3.

The SGI hardware manuals contain detailed descriptions of Altix system architecture. For a list of these manuals, see “Related Publications”.


Note: Online and postscript versions of SGI documentation is available at SGI Technical Publications Library at http://docs.sgi.com .


Altix UV 100 Overview

The SGI Altix UV 100 system is a small, blade-based, cache-coherent, non-uniform memory access (ccNUMA), computer system that is based on the Intel Xeon 7500 series processor. The SGI Altix UV 100 system scales, as follows:

A maximum of 768 processor cores

From 16 to 1536 threads in a single system image (SSI)


Note: Each processor core supports two threads.

The main component is a 3U-high IRU that supports two compute blades and is configurable to support multiple topology options.

The two compute blades in the IRU are interconnected using NUMAlink 5 technology. NUMAlink 5 has a peak aggregate bi-directional bandwidth of 15 GB/s. Multiple IRUs are also interconnected with NUMAllink 5 technology.

A maximum of twelve IRUs can be placed into a standard 42U 19" custom tall rack. Each rack supports a maximum of 384 processor cores.

The Altix UV system supports direct attach I/O on the compute blade. The compute blade is designed to host one of four different I/O riser cards. Various PCI express based I/O components are supported. For a detailed hardware description, see the SGI Altix UV 100 Systems User's Guide.

System Management

The system management provides a single control point for system power up, initialization, booting and maintenance. System management on an SGI Altix UV 1000 consists of three levels. The first level of system management is the board management controllers (BMCs) on the node boards. The second level is the chassis management controllers (CMC) in the rear of the IRU. The third level is the system management node (SMN). The SMN is required on SGI Altix UV 1000 series systems. It is not required for the SGI Altix UV 100 series systems.


Important: The UV 1000 and UV 100 system control network is a private, closed network. It is not to be reconfigured in any way different from the standard UV installation, nor is it to be directly connected to any other network. The UV system control network does not accommodate additional network traffic, routing, address naming other than its own schema, and DCHP controls other than its own configuration. The system control network also is not security hardened, nor is it tolerant of heavy network traffic, and is vulnerable to Denial of Service attacks.

The System Management Node acts as a gateway between the UV system control network and any other networks.

SGI Management Center (SMC) software running on the system management node (SMN) provides a robust graphical interface for system configuration, operation, and monitoring. This manual describes commands that can be used on systems without an SMN or not running the SMC. For more information, see SGI Management Center System Administrator's Guide.

Chassis Manager Controller

The chassis manager controller (CMC) in the rear of the IRU, as shown in Figure 1-4, and Figure 1-5, supports powering up and down of the compute blades and environmental monitoring of all units within the IRU. The CMC sends operational requests to the baseboard manager controller (BMC) on each compute node. The CMC provides data collected from the compute nodes within the IRU to the system management node upon request. The CMC blade on the right side of the IRU is the primary CMC and the CMC blade on the left is an optional CMC for redundancy.

Figure 1-4. Chassis Manager Controller

Chassis Manager Controller

System Control Network

Chassis manager controller (CMC) has seven RJ45 Ethernet ports as shown in Figure 1-5.

The Ethernet ports are used, as follows:

  • SMN - the system management node port is used to connect to the SMN.

  • SBK - Each 16 rack group is called a super block. A building block is four racks. A super block is four building blocks. The SBK connects one super block to another super block.

  • CMC0 and CMC1 - these two ports are used to interconnect multiple IRUs within a building block together.

  • EXT0, EXT1, EXT2 - connects to external devices such as I/O chassis and smart PDUs.

CONSOLE - the console connection supports a serial channel connection directly to the CMC for system maintenance.

Figure 1-5. CMC Ethernet Ports

CMC Ethernet Ports

For information on finding the CMC IP address and hostname, see “Finding the CMC IP Address” in Chapter 2.

Determining Rack Numbers

The system controller network has strict requirements for rack numbering. The requirements minimize the amount of information that must be manually configured for each CMC when it is plugged into an IRU. Currently, only the rack and u-position of the IRU must be set. The u-position is the physical location of the IRU in the rack. The rack and u-position values are found in the /etc/sysconfig/module_id file. Besides uniquely identifying the physical location of the CMCs, the values are used to generate several IP address for the various VLANs on the CMC and are used by any software interacting with the system controller network to target operations.

For large Altix UV 1000 configurations, a building block consists of four racks with two IRUs in each rack with the CMCs in those IRUs interconnected via their CMC0 and CMC1 jacks. In order for racks to be considered part of the same building block, their rack numbers must be consecutive and satisfy the following equation:

(rack - 1) MOD 4 = 0, 1, 2 or 3

or

(rack - 1) DIV 4 = the same value for all racks in the building block

For example, a system with four racks numbered 1, 2, 3, and 4 has one building block. Similarly, a system with four racks number 9, 10, 11, and 12 has one building block.

A system with racks numbered 10, 11, 12, 13 would have to two building blocks with 10, 11 and 12 in one building block; 13 is in a second building block. The system controller network must be cabled appropriately for each configuration.

A super block (SBK) consists four building blocks. Two primary CMCs in each building block are used to interconnect the building blocks via their SBK jacks. For racks to be considered part of the same SBK their rack numbers must be consecutive and satisfy the following equation:

(rack - 1) MOD 16 = 0,1,2,... 15

or

(rack - 1) DIV 16 = the same value for all racks in the SBK

In summary, a single SBK can support up to four building blocks, or in other words, 16 racks.

Altix UV System Controller Software

The controller is designed to manage and monitor the individual blades in SGI Altix UV systems. Depending on your system configuration, you can monitor and operate the system from the system management node (SMN) or on smaller systems, such as, the Altix UV 100 from the CMC itself. UV 1000 systems up to 16 racks (four building blocks, also called one super block) can also be controlled and monitored from a CMC in the system.

The following list summarizes the control and monitoring functions that the CMC performs. Many of the controller functions are common across both IRU and routers; however, some functions are specific to the type of enclosure.

  • Controls voltage margining within the IRU or router

  • Controls and monitors IRU and router fan speeds

  • Reads system identification (ID) PROMs

  • Monitors voltage levels and reports failures

  • Monitors and controls warning LEDs on the enclosure

  • Monitors the On/Off power switch

  • Monitors the reset switch and the nonmaskable interrupt (NMI) switch

  • Reports the population of the PCIe cards and the power levels of the PCIe slots in installed PCIe riser blades

  • Powers on the PCIe slots and their associated LEDs

  • Provides the ability to create multiple system partitions (single system image) running their own operating system.

  • Provides ability to flash system BIOS