Chapter 3. Product Support

This chapter documents the product components that are supported on the SGI Altix 3000 series, SGI Altix 4000 series, SGI Altix ICE series, SGI Altix UV, and SGI Altix XE systems. (For a list of the products, see Table 3-1.)

Descriptions of the product components are grouped in this chapter as follows:

SGI ProPack 7 for Linux SP 1 Products

Software provided by SGI for the SGI ProPack 7 for Linux SP1 release consists of kernel modules for SGI software built against the kernels in SUSE Linux Enterprise Server 11 SP1 and value-add software developed by SGI specifically to run on SGI Altix or SGI Altix XE systems and some additional third-party software (see VTune, in the first row of the table, below).

Table 3-1. SGI ProPack 7 SP1 for Linux Products

Product

Architecture Supported

Description

Application performance measuring tools

ia64

VTune - This tool, developed and supported by Intel, uses the performance measurement facilities of the Itanium processor to take profiles based on elapsed time or other architected events within the processor. These profiles can be used to measure, tune, and improve an application's performance. For more information on VTune, go to the following web location:

http://developer.intel.com/software/products/vtune/

Array Services

ia64 and x86_64

Array Services includes administrator commands, libraries, daemons, and kernel extensions that support the execution of parallel applications across a number of hosts in a cluster, or array. The Message Passing Interface (MPI) of SGI ProPack uses Array Services to launch parallel applications. For information on MPI, see the Message Passing Toolkit (MPT) User's Guide

The secure version of Array Services is built to make use of secure sockets layer (SSL) and secure shell (SSH).

For more information on standard Array Services or Secure Array Services (SAS), see the Array Services chapter in the Linux Resource Administration Guide.

Cpuset System

ia64 and x86_64

The Cpuset System is primarily a workload manager tool permitting a system administrator to restrict the number of processors and memory resources that a process or set of processes may use. A system administrator can use cpusets to create a division of CPUs and memory resources within a larger system. For more information, see the “Cpusets on SGI ProPack 6 for Linux” chapter in the Linux Resource Administration Guide.

CSA

ia64 and x86_64

Provides jobs-based accounting of per-task resources and disk usage for specific login accounts on Linux systems. Linux CSA application interface library allows software applications to manipulate and obtain status about Linux CSA accounting methods. For more information, see the CSA chapter in the Linux Resource Administration Guide.

IOC4 serial driver

ia64

Driver that supports the Internal IDE CD-ROM, NVRAM, and Real-Time Clock.

Serial ports are supported on the IOC4 base I/O chipset and the following device nodes are created:

/dev/ttyIOC4/0
/dev/ttyIOC4/1
/dev/ttyIOC4/2
/dev/ttyIOC4/3

Kernel partitioning support

ia64

Provides the software infrastructure necessary to support a partitioned system, including cross-partition communication support. For more information on system partitioning, see the SGI Altix UV Linux Configuration and Operations Guide or the Linux Configuration and Operations Guide.

MPT

ia64 and x86_64

Provides industry-standard message passing libraries optimized for SGI computers. For more information on MPT, see the Message Passing Toolkit (MPT) User's Guide.

NUMA tools

ia64 and x86_64

Provides a collection of NUMA related tools (dlook(1), dplace(1), and so on). For more information on NUMA tools, see the Linux Application Tuning Guide.

Performance Co-Pilot collector infrastructure

ia64 and x86_64

Provides performance monitoring and performance management services targeted at large, complex systems. For more information on Performance Co-Pilot, see Performance Co-Pilot for IA-64 Linux User's and Administrator's Guide.

REACT real-time for Linux

ia64 and x86_64

Support for real-time programs. For more information, see the REACT Real-Time for Linux Programmer's Guide.

Utilities

ia64 and x86_64

udev_xsci, is a udev helper for doing XSCSI device names. sgtools, a set of tools for SCSI disks using the Linux SG driver and lsiutil, the LSI Fusion-MPT host adapter management utility.

XVM

ia64 and x86_64

Provides software volume manager functionality such as disk striping and mirroring. For more information on XVM, see the XVM Volume Manager Administrator's Guide.

SGI does not support the following:

  • Base Linux software not released by Novell for SLES11 SP1 or other software not released by SGI.

  • Other releases, updates, or patches not released by Novell for SLES11 SP1 or by SGI for SGI ProPack software.

  • Software patches, drivers, or other changes obtained from the Linux community or vendors other than Novell and SGI.

  • Kernels recompiled or reconfigured to run with parameter settings or other modules as not specified by Novell and SGI.

  • Unsupported hardware configurations and devices.

Operating System Enhancements

Building on the Linux operating system's rapid expansion and improvements for general commercial and enterprise environments, SGI has focused on improving Linux capabilities and performance specifically for high performance computing's (HPC's) big compute and big data environments. Thus, SGI has leveraged its experience with NUMAflex and HPC from its IRIX operating systems and MIPS processor-based systems and concentrated on the Linux kernel improvements specifically important to HPC environments.

Cpuset Support

The cpuset facility is primarily a workload manager tool permitting a system administrator to restrict the number of processors and memory resources that a process or set of processes may use. A cpuset defines a list of CPUs and memory nodes. A process contained in a cpuset may only execute on the CPUs in that cpuset and may only allocate memory on the memory nodes in that cpuset. Essentially, cpusets provide you with a CPU and memory containers or "soft partitions" within which you can run sets of related tasks. Using cpusets on an SGI Altix system improves cache locality and memory access times and can substantially improve an applications performance and runtime repeatability. Restraining all other jobs from using any of the CPUs or memory resources assigned to a critical job minimizes interference from other jobs on the system. For example, Message Passing Interface (MPI) jobs frequently consist of a number of threads that communicate using message passing interfaces. All threads need to be executing at the same time. If a single thread loses a CPU, all threads stop making forward progress and spin at a barrier. Cpusets can eliminate the need for a gang scheduler.

Cpusets are represented in a hierarchical virtual file system. Cpusets can be nested and they have file-like permissions.

In addition to their traditional use to control the placement of jobs on the CPUs and memory nodes of a system, cpusets also provide a convenient mechanism to control the use of Hyper-Threading Technology.

For detailed information on cpusets, see Chapter 6, “Cpusets on SGI ProPack 6 for Linux” in the Linux Resource Administration Guide.

Comprehensive System Accounting (CSA)

The port of Comprehensive System Accounting (CSA) software packages from IRIX to Linux is the result of an open source collaboration between SGI and Los Alamos National Laboratory (LANL) to provide jobs-based accounting of per-task resources and disk usage for specific login accounts on Linux systems.

Providing extensive system accounting capabilities is often important for very large systems, especially when the system will be shared or made available for other organizations to use. CSA uses a Job Containers feature, which provides on Linux the notion of a job. A job is an inescapable container and a collection of processes that enables CSA to track resources for any point of entry to a machine (for example, interactive login, cron job, remote login, batched workload, and so on).

The Linux CSA application interface library allows software applications to manipulate and obtain status about Linux CSA accounting methods.

CSA on Linux is an SGI open source project, also available from the following location:

http://oss.sgi.com/projects/csa

For further documentation and details on CSA support, see the chapter titled “Comprehensive System Accounting” in the Linux Resource Administration Guide.

Partitioning

SGI provides the ability to divide a single SGI Altix system into a collection of smaller system partitions. Each partition runs its own copy of the operating system kernel and has its own system console, root filesystem, IP network address, and physical memory. All partitions in the system are connected via the SGI high-performance NUMAlink interconnect, just as they are when the system is not partitioned. Thus, a partitioned system can also be viewed as a cluster of nodes connected via NUMAlink.

Benefits of partitioning include fault containment and the ability to use the NUMAlink interconnect and global shared memory features of the SGI Altix to provide high-performance clusters.

For further documentation and details on partitioning, see the SGI Altix UV Systems Linux Configuration and Operations Guide or the Linux Configuration and Operations Guide.

I /O Subsystems

Although some HPC workloads might be mostly CPU bound, others involve processing large amounts of data and require an I/O subsystem capable of moving data between memory and storage quickly, as well as having the ability to manage large storage farms effectively. The XFS filesystem, XVM volume manager, and data migration facilities were leveraged from IRIX and ported to provide a robust, high-performance, and stable storage I/O subsystem on Linux. This section covers the following topics:

Persistent IP Addressing of Ethernet Interfaces

An Ethernet interface can be given a persistent internet addresses by associating its permanent MAC address, such as 08:00:69:13:f1:aa, with an internet protocol (IP) address, for example 192.168.20.1. An interface with a persistent IP address will be given the same IP address each time the system is booted. For more information, see “Persistent Network Interface Names” in the Linux Configuration and Operations Guide.

PCI Domain Support for SGI Altix 450 and 4700 Systems

On an SGI Altix 450 and 4700 system, a PCI domain is a functional entity that includes a root bridge, subordinate buses under the root bridge, and the peripheral devices it controls. Separation, management, and protection of PCI domains is implemented and controlled by system software. For more information, see “PCI Domain Support for SGI Altix 450 and 4700 Systems” in the Linux Configuration and Operations Guide.

XSCSI Naming Systems on SGI ProPack Systems

The XSCSI subsystem on SGI ProPack 4 systems was an I/O infrastructure that leveraged technology from the IRIX operating system to provide more robust error handling, failover, and storage area network (SAN) infrastructure support, as well as long-term, large system performance tuning. This subsystem is not necessary for SGI ProPack 7 systems. However, XSCSI naming is available on SGI ProPack7 systems. For more information, see “XSCSI Naming Systems on SGI ProPack Systems” in the Linux Configuration and Operations Guide.

XFS Filesystem

The SGI XFS filesystem provides a high-performance filesystem for Linux. XFS is an open-source, fast recovery, journaling filesystem that provides direct I/O support, space preallocation, access control lists, quotas, and other commercial file system features. Although other filesystems are available on Linux, performance tuning and improvements leveraged from IRIX make XFS particularly well suited for large data and I/O workloads commonly found in HPC environments.

For more information on the XFS filesystem, see XFS for Linux Administration.

XVM Volume Manager

The SGI XVM Volume Manager provides a logical organization to disk storage that enables an administrator to combine underlying physical disk storage into a single logical unit, known as a logical volume. Logical volumes behave like standard disk partitions and can be used as arguments anywhere a partition can be specified.

A logical volume allows a filesystem or raw device to be larger than the size of a physical disk. Using logical volumes can also increase disk I/O performance because a volume can be striped across more than one disk. Logical volumes can also be used to mirror data on different disks.

This release adds a new XVM multi-host failover feature. For more information on this new feature and XVM Volume Manager in general, see the XVM Volume Manager Administrator's Guide.

HPC Application Tools and Support

SGI has ported HPC libraries, tools, and software packages from IRIX to Linux to provide a powerful, standards-based system using Linux and Itanium 2-based solutions for HPC environments. The following sections describe some of these tools, libraries, and software.

Message Passing Toolkit

The SGI Message Passing Toolkit (MPT) provides industry-standard message passing libraries optimized for SGI computers. On Linux, MPT contains MPI and SHMEM APIs, which transparently utilize and exploit the low-level capabilities within SGI hardware, such as memory mapping within and between partitions for fast memory-to-memory transfers and the hardware memory controller's fetch operation (fetchop) support. Fetchops and other shared memory techniques enable ultra fast communication and synchronization between MPI processes in a parallel application.

MPI jobs can be launched, monitored, and controlled across a cluster or partitioned system using the SGI Array Services software. Array Services provides the notion of an array session, which is a set of processes that can be running on different cluster nodes or system partitions. Array Services is implemented using Process Aggregates (PAGGs), which is a kernel module that provides process containers. PAGGs has been open-sourced by SGI for Linux.

For more information on the Message Passing Toolkit, see the Message Passing Toolkit (MPT) User's Guide.

MVAPICH2 and OpenMPI

SGI no longer ships any open source MPI packages via the SGI Propack for Linux release. MVAPICH2 and OpenMPI RPMs are available as a courtesy, via the cool downloads section on Supportfolio at https://support.sgi.com/browse_request/dcs .

These RPMs are IB interconnect, Intel compiler compiled versions of the open source products.

SGI MPT is provided with the SGI Propack 7 release and supports all SGI platforms.

Performance Co-Pilot

The SGI Performance Co-Pilot software was ported from IRIX to Linux to provide a collection of performance monitoring and performance management services targeted at large, complex systems. Integrated with the low-level performance hardware counters and with MPT, Performance Co-Pilot provides such services as CPU, I/O, and networking statistics; visualization tools; and monitoring tools.

For more information on Performance Co-Pilot, see the Performance Co-Pilot for IA-64 Linux User's and Administrator's Guide.

Extensible Firmware Interface (EFI)

The Extensible Firmware Interface (EFI), a supporting platform to provide input to the CPU and to handle its output, is provided by SLES11, the base Linux operating system for SGI Altix systems running SGI ProPack 7. EFI also controls the server's boot configuration, maintaining the boot menu in durable, non-volatile memory.

SLES11 uses the elilo package which places the bootloader (elilo.efi) and configuration file (elilo.conf) in the /boot/efi/efi/SuSE/ directory on SGI Altix systems.


Note: When booting from SLES11, use the bootia64 command instead of elilo. Once the system is running SLES11 use elilo to boot from EFI.



Note: If you have installed multiple kernel images and want to boot with one that is not currently the system default (vmlinuz in /boot/efi/efi/SuSE), simply copy the vmlinuz and initrd files for the kernel you wish to use from /boot to /boot/efi/efi/SuSE.

For a summary of EFI commands, see Table 3-2.

Table 3-2. EFI Commands

EFI Command

Description

alias [-bdv] [sname] [value]

Sets or gets alias settings

attrib [-b] [+/- rhs] [file]

Views or sets file attributes

bcfg 

Configures boot driver and load options

cd [path]

Updates the current directory

cls [background color]

Clears screen

comp file1 file2

Compares two files

cp file [file] ... [dest]

Copies files or directories

date [mm/dd/yyyy]

Gets or sets date

dblk device [Lba] [blocks]

Performs hex dump of block I/O devices

dh [-b] [-p prot_id] | [handle]

Dumps handle information

dmpstore

Dumps variable store

echo [-on | -off] | [text]

Echoes text to stdout or toggles script echo

edit [file name]

Edits a file

endfor

Script only: Delimits loop construct

endif

Script-only: Delimits IF THEN construct

err [level]

Sets or displays error level

exit

Exits

flash filename

Flashes PROM on C-brick

for var in set

Script-only: Indicates loop construct

getmtc

Gets next monotonic count

goto label

Script-only: Jumps to label location in script

guid [-b] [sname]

Dumps known guid IDs

help [-b] [internal command]

Displays this help

if [not] condition then

Script-only: Indicates IF THEN construct

load driver_name

Loads a driver

ls [-b] [dir] [dir] ...

Obtains directory listing

map [-bdvr] [sname[:]] [handle]

Maps shortname to device path

mem [address] [size] [;MMIO]

Dumps memory or memory mapped I/O

memmap [-b]

Dumps memory map

mkdir dir [dir] ...

Makes directory

mm address [width] [;type]

Modifies memory: Mem, MMIO, IO, PCI

mode [col row]

Sets or gets current text mode

mount BlkDevice [sname[:]]

Mounts a filesytem on a block device

mv sfile dfile

Moves files

pause

Script-only: Prompts to quit or continue

pci [bus dev] [func]

Displays PCI device(s) info

reset [cold/warm] [reset string]

Indicates cold or warm reset

rm file/dir [file/dir]

Removes file or directories

set [-bdv] [sname] [value]

Sets or gets environment variable

setsize newsize fname

Sets the files size

stall microseconds

Delays for x microseconds

time [hh:mm:ss]

Gets or sets time

touch [filename]

Views or sets file attributes

type [-a] [-u] [-b] file

Types file

ver

Displays version information

vol fs [volume label]

Sets or displays volume label


SGIconsole

SGIconsole is a combination of hardware and software that provides console management and allows monitoring of multiple SGI servers running the IRIX operating system and SGI ProPack for Linux. These servers include SGI partitioned systems and large, single-system-image servers, including SGI Altix 350 and 450 systems and the SGI Altix 3000 and 4000 family of servers and superclusters.

SGIconsole consists of an 1U rackmountable SGI server based on the Intel Pentium processor, a serial multiplexer or Ethernet hub, and a software suite that includes the Console Manager package and Performance Co-Pilot, which provides access to common remote management tools for hardware and software.

Console Manager is a graphical user interface for the SGIconsole management and monitoring tool used to control multiple SGI servers. SGIconsole also has a command line interface. For more information on SGIconsole, see the SGIconsole Start Here.

NUMA Data Placement Tools

This section describes the commands that are currently provided with the collection of NUMA related data placement tools that can help you with tuning applications on your system.


Note: Performance tuning information for single processor and multiprocessor programs resides in Linux Application Tuning Guide.


dlook Command

The dlook(1)  command displays the memory map and CPU use for a specified process. The following information is printed for each page in the virtual address space of the process:

  • The object that owns the page (file, SYSV shared memory, device driver, and so on)

  • Type of page (RAM, FETCHOP, IOSPACE, and so on)

  • If RAM memory, the following information is supplied:

    • Memory attributes (SHARED, DIRTY, and so on)

    • Node on which that the page is located

    • Physical address of page (optional)

Optionally, the amount of elapsed CPU time that the process has executed on each physical CPU in the system is also printed.

dplace Command

The dplace(1) command binds a related set of processes to specific CPUs or nodes to prevent process migrations. In some cases, this tool improves performance because of the occurrence of a higher percentage of memory accesses to the local node.

taskset Command

The taskset(1) command is used to set or retrieve the CPU affinity of a running process given its PID or to launch a new command with a given CPU affinity. CPU affinity is a scheduler property that "bonds" a process to a given set of CPUs on the system. The Linux scheduler will honor the given CPU affinity and the process will not run on any other CPUs. Note that the Linux scheduler also supports natural CPU affinity; the scheduler attempts to keep processes on the same CPU as long as practical for performance reasons. Therefore, forcing a specific CPU affinity is useful only in certain applications.

For more information on NUMA tools, see Chapter 5, “Data Placement Tools” in the Linux Application Tuning Guide.