This chapter documents the product components that are supported on the SGI Altix 3000 series, SGI Altix 4000 series, SGI Altix ICE series, SGI Altix UV, and SGI Altix XE systems. (For a list of the products, see Table 3-1.)
Descriptions of the product components are grouped in this chapter as follows:
Software provided by SGI for the SGI ProPack 7 for Linux SP1 release consists of kernel modules for SGI software built against the kernels in SUSE Linux Enterprise Server 11 SP1 and value-add software developed by SGI specifically to run on SGI Altix or SGI Altix XE systems and some additional third-party software (see VTune, in the first row of the table, below).
Table 3-1. SGI ProPack 7 SP1 for Linux Products
Product | Architecture Supported | Description |
---|---|---|
ia64 | VTune - This tool, developed and supported by Intel, uses the performance measurement facilities of the Itanium processor to take profiles based on elapsed time or other architected events within the processor. These profiles can be used to measure, tune, and improve an application's performance. For more information on VTune, go to the following web location: | |
ia64 and x86_64 | Array Services includes administrator commands, libraries, daemons, and kernel extensions that support the execution of parallel applications across a number of hosts in a cluster, or array. The Message Passing Interface (MPI) of SGI ProPack uses Array Services to launch parallel applications. For information on MPI, see the Message Passing Toolkit (MPT) User's Guide The secure version of Array Services is built to make use of secure sockets layer (SSL) and secure shell (SSH). For more information on standard Array Services or Secure Array Services (SAS), see the Array Services chapter in the Linux Resource Administration Guide. | |
Cpuset System | ia64 and x86_64 | The Cpuset System is primarily a workload manager tool permitting a system administrator to restrict the number of processors and memory resources that a process or set of processes may use. A system administrator can use cpusets to create a division of CPUs and memory resources within a larger system. For more information, see the “Cpusets on SGI ProPack 6 for Linux” chapter in the Linux Resource Administration Guide. |
CSA | ia64 and x86_64 | Provides jobs-based accounting of per-task resources and disk usage for specific login accounts on Linux systems. Linux CSA application interface library allows software applications to manipulate and obtain status about Linux CSA accounting methods. For more information, see the CSA chapter in the Linux Resource Administration Guide. |
IOC4 serial driver | ia64 | Driver that supports the Internal IDE CD-ROM, NVRAM, and Real-Time Clock. Serial ports are supported on the IOC4 base I/O chipset and the following device nodes are created: /dev/ttyIOC4/0 |
Kernel partitioning support | ia64 | Provides the software infrastructure necessary to support a partitioned system, including cross-partition communication support. For more information on system partitioning, see the SGI Altix UV Linux Configuration and Operations Guide or the Linux Configuration and Operations Guide. |
MPT | ia64 and x86_64 | Provides industry-standard message passing libraries optimized for SGI computers. For more information on MPT, see the Message Passing Toolkit (MPT) User's Guide. |
NUMA tools | ia64 and x86_64 | Provides a collection of NUMA related tools (dlook(1), dplace(1), and so on). For more information on NUMA tools, see the Linux Application Tuning Guide. |
Performance Co-Pilot collector infrastructure | ia64 and x86_64 | Provides performance monitoring and performance management services targeted at large, complex systems. For more information on Performance Co-Pilot, see Performance Co-Pilot for IA-64 Linux User's and Administrator's Guide. |
REACT real-time for Linux | ia64 and x86_64 | Support for real-time programs. For more information, see the REACT Real-Time for Linux Programmer's Guide. |
Utilities | ia64 and x86_64 | udev_xsci, is a udev helper for doing XSCSI device names. sgtools, a set of tools for SCSI disks using the Linux SG driver and lsiutil, the LSI Fusion-MPT host adapter management utility. |
XVM | ia64 and x86_64 | Provides software volume manager functionality such as disk striping and mirroring. For more information on XVM, see the XVM Volume Manager Administrator's Guide. |
SGI does not support the following:
Base Linux software not released by Novell for SLES11 SP1 or other software not released by SGI.
Other releases, updates, or patches not released by Novell for SLES11 SP1 or by SGI for SGI ProPack software.
Software patches, drivers, or other changes obtained from the Linux community or vendors other than Novell and SGI.
Kernels recompiled or reconfigured to run with parameter settings or other modules as not specified by Novell and SGI.
Unsupported hardware configurations and devices.
Building on the Linux operating system's rapid expansion and improvements for general commercial and enterprise environments, SGI has focused on improving Linux capabilities and performance specifically for high performance computing's (HPC's) big compute and big data environments. Thus, SGI has leveraged its experience with NUMAflex and HPC from its IRIX operating systems and MIPS processor-based systems and concentrated on the Linux kernel improvements specifically important to HPC environments.
The cpuset facility is primarily a workload manager tool permitting a system administrator to restrict the number of processors and memory resources that a process or set of processes may use. A cpuset defines a list of CPUs and memory nodes. A process contained in a cpuset may only execute on the CPUs in that cpuset and may only allocate memory on the memory nodes in that cpuset. Essentially, cpusets provide you with a CPU and memory containers or "soft partitions" within which you can run sets of related tasks. Using cpusets on an SGI Altix system improves cache locality and memory access times and can substantially improve an applications performance and runtime repeatability. Restraining all other jobs from using any of the CPUs or memory resources assigned to a critical job minimizes interference from other jobs on the system. For example, Message Passing Interface (MPI) jobs frequently consist of a number of threads that communicate using message passing interfaces. All threads need to be executing at the same time. If a single thread loses a CPU, all threads stop making forward progress and spin at a barrier. Cpusets can eliminate the need for a gang scheduler.
Cpusets are represented in a hierarchical virtual file system. Cpusets can be nested and they have file-like permissions.
In addition to their traditional use to control the placement of jobs on the CPUs and memory nodes of a system, cpusets also provide a convenient mechanism to control the use of Hyper-Threading Technology.
For detailed information on cpusets, see Chapter 6, “Cpusets on SGI ProPack 6 for Linux” in the Linux Resource Administration Guide.
The port of Comprehensive System Accounting (CSA) software packages from IRIX to Linux is the result of an open source collaboration between SGI and Los Alamos National Laboratory (LANL) to provide jobs-based accounting of per-task resources and disk usage for specific login accounts on Linux systems.
Providing extensive system accounting capabilities is often important for very large systems, especially when the system will be shared or made available for other organizations to use. CSA uses a Job Containers feature, which provides on Linux the notion of a job. A job is an inescapable container and a collection of processes that enables CSA to track resources for any point of entry to a machine (for example, interactive login, cron job, remote login, batched workload, and so on).
The Linux CSA application interface library allows software applications to manipulate and obtain status about Linux CSA accounting methods.
CSA on Linux is an SGI open source project, also available from the following location:
http://oss.sgi.com/projects/csa
For further documentation and details on CSA support, see the chapter titled “Comprehensive System Accounting” in the Linux Resource Administration Guide.
SGI provides the ability to divide a single SGI Altix system into a collection of smaller system partitions. Each partition runs its own copy of the operating system kernel and has its own system console, root filesystem, IP network address, and physical memory. All partitions in the system are connected via the SGI high-performance NUMAlink interconnect, just as they are when the system is not partitioned. Thus, a partitioned system can also be viewed as a cluster of nodes connected via NUMAlink.
Benefits of partitioning include fault containment and the ability to use the NUMAlink interconnect and global shared memory features of the SGI Altix to provide high-performance clusters.
For further documentation and details on partitioning, see the SGI Altix UV Systems Linux Configuration and Operations Guide or the Linux Configuration and Operations Guide.
Although some HPC workloads might be mostly CPU bound, others involve processing large amounts of data and require an I/O subsystem capable of moving data between memory and storage quickly, as well as having the ability to manage large storage farms effectively. The XFS filesystem, XVM volume manager, and data migration facilities were leveraged from IRIX and ported to provide a robust, high-performance, and stable storage I/O subsystem on Linux. This section covers the following topics:
An Ethernet interface can be given a persistent internet addresses by associating its permanent MAC address, such as 08:00:69:13:f1:aa, with an internet protocol (IP) address, for example 192.168.20.1. An interface with a persistent IP address will be given the same IP address each time the system is booted. For more information, see “Persistent Network Interface Names” in the Linux Configuration and Operations Guide.
On an SGI Altix 450 and 4700 system, a PCI domain is a functional entity that includes a root bridge, subordinate buses under the root bridge, and the peripheral devices it controls. Separation, management, and protection of PCI domains is implemented and controlled by system software. For more information, see “PCI Domain Support for SGI Altix 450 and 4700 Systems” in the Linux Configuration and Operations Guide.
The XSCSI subsystem on SGI ProPack 4 systems was an I/O infrastructure that leveraged technology from the IRIX operating system to provide more robust error handling, failover, and storage area network (SAN) infrastructure support, as well as long-term, large system performance tuning. This subsystem is not necessary for SGI ProPack 7 systems. However, XSCSI naming is available on SGI ProPack7 systems. For more information, see “XSCSI Naming Systems on SGI ProPack Systems” in the Linux Configuration and Operations Guide.
The SGI XFS filesystem provides a high-performance filesystem for Linux. XFS is an open-source, fast recovery, journaling filesystem that provides direct I/O support, space preallocation, access control lists, quotas, and other commercial file system features. Although other filesystems are available on Linux, performance tuning and improvements leveraged from IRIX make XFS particularly well suited for large data and I/O workloads commonly found in HPC environments.
For more information on the XFS filesystem, see XFS for Linux Administration.
The SGI XVM Volume Manager provides a logical organization to disk storage that enables an administrator to combine underlying physical disk storage into a single logical unit, known as a logical volume. Logical volumes behave like standard disk partitions and can be used as arguments anywhere a partition can be specified.
A logical volume allows a filesystem or raw device to be larger than the size of a physical disk. Using logical volumes can also increase disk I/O performance because a volume can be striped across more than one disk. Logical volumes can also be used to mirror data on different disks.
This release adds a new XVM multi-host failover feature. For more information on this new feature and XVM Volume Manager in general, see the XVM Volume Manager Administrator's Guide.
SGI has ported HPC libraries, tools, and software packages from IRIX to Linux to provide a powerful, standards-based system using Linux and Itanium 2-based solutions for HPC environments. The following sections describe some of these tools, libraries, and software.
The SGI Message Passing Toolkit (MPT) provides industry-standard message passing libraries optimized for SGI computers. On Linux, MPT contains MPI and SHMEM APIs, which transparently utilize and exploit the low-level capabilities within SGI hardware, such as memory mapping within and between partitions for fast memory-to-memory transfers and the hardware memory controller's fetch operation (fetchop) support. Fetchops and other shared memory techniques enable ultra fast communication and synchronization between MPI processes in a parallel application.
MPI jobs can be launched, monitored, and controlled across a cluster or partitioned system using the SGI Array Services software. Array Services provides the notion of an array session, which is a set of processes that can be running on different cluster nodes or system partitions. Array Services is implemented using Process Aggregates (PAGGs), which is a kernel module that provides process containers. PAGGs has been open-sourced by SGI for Linux.
For more information on the Message Passing Toolkit, see the Message Passing Toolkit (MPT) User's Guide.
SGI no longer ships any open source MPI packages via the SGI Propack for Linux release. MVAPICH2 and OpenMPI RPMs are available as a courtesy, via the cool downloads section on Supportfolio at https://support.sgi.com/browse_request/dcs .
These RPMs are IB interconnect, Intel compiler compiled versions of the open source products.
SGI MPT is provided with the SGI Propack 7 release and supports all SGI platforms.
The SGI Performance Co-Pilot software was ported from IRIX to Linux to provide a collection of performance monitoring and performance management services targeted at large, complex systems. Integrated with the low-level performance hardware counters and with MPT, Performance Co-Pilot provides such services as CPU, I/O, and networking statistics; visualization tools; and monitoring tools.
For more information on Performance Co-Pilot, see the Performance Co-Pilot for IA-64 Linux User's and Administrator's Guide.
The Extensible Firmware Interface (EFI), a supporting platform to provide input to the CPU and to handle its output, is provided by SLES11, the base Linux operating system for SGI Altix systems running SGI ProPack 7. EFI also controls the server's boot configuration, maintaining the boot menu in durable, non-volatile memory.
SLES11 uses the elilo package which places the bootloader (elilo.efi) and configuration file (elilo.conf) in the /boot/efi/efi/SuSE/ directory on SGI Altix systems.
![]() | Note: When booting from SLES11, use the bootia64 command instead of elilo. Once the system is running SLES11 use elilo to boot from EFI. |
![]() | Note: If you have installed multiple kernel images and want to boot with one that is not currently the system default (vmlinuz in /boot/efi/efi/SuSE), simply copy the vmlinuz and initrd files for the kernel you wish to use from /boot to /boot/efi/efi/SuSE. |
For a summary of EFI commands, see Table 3-2.
EFI Command | Description |
---|---|
alias [-bdv] [sname] [value] | Sets or gets alias settings |
attrib [-b] [+/- rhs] [file] | Views or sets file attributes |
bcfg | Configures boot driver and load options |
cd [path] | Updates the current directory |
cls [background color] | Clears screen |
comp file1 file2 | Compares two files |
cp file [file] ... [dest] | Copies files or directories |
date [mm/dd/yyyy] | Gets or sets date |
dblk device [Lba] [blocks] | Performs hex dump of block I/O devices |
dh [-b] [-p prot_id] | [handle] | Dumps handle information |
dmpstore | Dumps variable store |
echo [-on | -off] | [text] | Echoes text to stdout or toggles script echo |
edit [file name] | Edits a file |
endfor | Script only: Delimits loop construct |
endif | Script-only: Delimits IF THEN construct |
err [level] | Sets or displays error level |
exit | Exits |
flash filename | Flashes PROM on C-brick |
for var in set | Script-only: Indicates loop construct |
getmtc | Gets next monotonic count |
goto label | Script-only: Jumps to label location in script |
guid [-b] [sname] | Dumps known guid IDs |
help [-b] [internal command] | Displays this help |
if [not] condition then | Script-only: Indicates IF THEN construct |
load driver_name | Loads a driver |
ls [-b] [dir] [dir] ... | Obtains directory listing |
map [-bdvr] [sname[:]] [handle] | Maps shortname to device path |
mem [address] [size] [;MMIO] | Dumps memory or memory mapped I/O |
memmap [-b] | Dumps memory map |
mkdir dir [dir] ... | Makes directory |
mm address [width] [;type] | Modifies memory: Mem, MMIO, IO, PCI |
mode [col row] | Sets or gets current text mode |
mount BlkDevice [sname[:]] | Mounts a filesytem on a block device |
mv sfile dfile | Moves files |
pause | Script-only: Prompts to quit or continue |
pci [bus dev] [func] | Displays PCI device(s) info |
reset [cold/warm] [reset string] | Indicates cold or warm reset |
rm file/dir [file/dir] | Removes file or directories |
set [-bdv] [sname] [value] | Sets or gets environment variable |
setsize newsize fname | Sets the files size |
stall microseconds | Delays for x microseconds |
time [hh:mm:ss] | Gets or sets time |
touch [filename] | Views or sets file attributes |
type [-a] [-u] [-b] file | Types file |
ver | Displays version information |
vol fs [volume label] | Sets or displays volume label |
SGIconsole is a combination of hardware and software that provides console management and allows monitoring of multiple SGI servers running the IRIX operating system and SGI ProPack for Linux. These servers include SGI partitioned systems and large, single-system-image servers, including SGI Altix 350 and 450 systems and the SGI Altix 3000 and 4000 family of servers and superclusters.
SGIconsole consists of an 1U rackmountable SGI server based on the Intel Pentium processor, a serial multiplexer or Ethernet hub, and a software suite that includes the Console Manager package and Performance Co-Pilot, which provides access to common remote management tools for hardware and software.
Console Manager is a graphical user interface for the SGIconsole management and monitoring tool used to control multiple SGI servers. SGIconsole also has a command line interface. For more information on SGIconsole, see the SGIconsole Start Here.
This section describes the commands that are currently provided with the collection of NUMA related data placement tools that can help you with tuning applications on your system.
![]() | Note: Performance tuning information for single processor and multiprocessor programs resides in Linux Application Tuning Guide. |
The dlook(1) command displays the memory map and CPU use for a specified process. The following information is printed for each page in the virtual address space of the process:
The object that owns the page (file, SYSV shared memory, device driver, and so on)
Type of page (RAM, FETCHOP, IOSPACE, and so on)
If RAM memory, the following information is supplied:
Memory attributes (SHARED, DIRTY, and so on)
Node on which that the page is located
Physical address of page (optional)
Optionally, the amount of elapsed CPU time that the process has executed on each physical CPU in the system is also printed.
The dplace(1) command binds a related set of processes to specific CPUs or nodes to prevent process migrations. In some cases, this tool improves performance because of the occurrence of a higher percentage of memory accesses to the local node.
The taskset(1) command is used to set or retrieve the CPU affinity of a running process given its PID or to launch a new command with a given CPU affinity. CPU affinity is a scheduler property that "bonds" a process to a given set of CPUs on the system. The Linux scheduler will honor the given CPU affinity and the process will not run on any other CPUs. Note that the Linux scheduler also supports natural CPU affinity; the scheduler attempts to keep processes on the same CPU as long as practical for performance reasons. Therefore, forcing a specific CPU affinity is useful only in certain applications.
For more information on NUMA tools, see Chapter 5, “Data Placement Tools” in the Linux Application Tuning Guide.