Chapter 2. Administrating MPT

This chapter is provided for system administrators who install, configure, and administer software on SGI Altix systems. It covers the following topics:

Finding the MPT Release Notes

Find the latest MPT release notes on your system, as follows:

% rpm -qi sgi-mpt | grep README.relnotes
 /opt/sgi/mpt/mpt-2.00/doc/README.relnotes       

Next, change directory to the location found, and list the contents of the directory, as follows:

% cd /opt/sgi/mpt/mpt-2.00/doc
% ls
MPT_UG  README.relnotes  sgi-mpt.2.00.template 

The release notes are in a file called README.relnotes.

MPT Installation

This section describes requirements and procedures for MPT installation. After you have installed the MPT and prerequisite software per the instructions in this section, be sure to perform the steps described in “System Configuration”.


Note: The MPI installation and configuration information found in this chapter is also available in the READ.relnotes file in the /opt/sgi/mpt/mpt-2.00/doc directory.


Disk Space Requirements

Disk space requirements for the MPT product are substantially less than 20 Mbytes.

Prerequisites

This section describes software prerequisites for MPT.

SGI ProPack Components

A default install of SGI ProPack is recommended. This provides a number of software components required by MPT. The SGI ProPack RPMs required by MPT include the following:

  • sgi-arraysvcs or sgi-sarraysvcs

  • sgi-procset

  • libbitmask

  • libcpuset

  • cpuset-utils

  • sgi-release

  • xpmem

InfiniBand Software Stack

If you are using the InfiniBand interconnect, you must ensure that one of the supported InfiniBand software stacks is installed. These include the OpenFabrics Enterprise Distribution (OFED) software provided with SGI ProPack 5 SP2 (or later) on SGI Altix XE and SGI Altix ICE systems and SGI ProPack 5 SP3 (or later) on SGI Altix 4000 series systems.

Installing the MPT RPM

MPT is supplied as an RPM file. The name of the file contains the following information:

  • Product (sgi-mpt)

  • MPT Version (2.00)

  • SGI ProPack Version (700)

  • Architecture (SGI IA-64 or x86_64 based systems)

For example, if the name of the MPT RPM for the MPT 2.00 release is sgi-mpt-2.00-sgi700.ia64.rpm. To install this RPM, log in as root and issue the following command:

 # rpm -Uvh sgi-mpt-2.00-sgi700.ia64.rpm

Installing MPT Software in an Alternate Location

RPM provides a means for creating, installing, and managing relocatable packages. That is, the MPT RPM can be installed in either a default or alternate location.

The default location for installing the MPT RPM is /opt/sgi/mpt/. To install the MPT RPM in an alternate location, use the --relocate option, as shown in the following example. The --relocate option specifies the alternate base directory for the installation of the MPT software (in this case, /tmp).

# rpm -i --relocate /opt/sgi/mpt/mpt-2.00=/tmp --excludepath /usr sgi-mpt-2.00-sgi700.ia64.rpm


Note: If the MPT software is installed in an alternate location, MPT users must set the environment variables PATH and LD_LIBRARY_PATH to specify the search locations for the mpirun command and the run-time libraries, assuming the alternate location of /tmp, as follows:
setenv PATH /tmp/bin:${PATH}
export PATH=/tmp/bin:${PATH}

export LD_LIBRARY_PATH=/tmp/lib
export LD_LIBRARY_PATH=/tmp/lib



If the site is using environment modules to manage the user environment, then the alternate location should be placed in the mpt modulefile. This approach is the most convenient way to establish environment variable settings that enable MPT program developers and users to access the MPT software when installed in an alternate location. Sample modulefiles are located in /opt/sgi/mpt/mpt-2.00/doc and /usr/share/modules/modulefiles/mpt/2.00 .

For more information, see "Using Dynamic Shared Libraries to Run MPI Jobs," later in this chapter.

Using a cpio File for Installation

The cpio file installation method described here is useful when the MPT software is installed in an NFS filesystem shared by a number of hosts. In this case, it is not important or desirable for the RPM database on only one of the machines to track the versions of MPT that are installed. Another advantage of the approach is that you do not need root permission to install the MPT software.

To install MPT using a cpio file, first convert the MPT RPM to a cpio file by executing the rpm2cpio command, as follows:

% rpm2cpio sgi-mpt-2.00-1.ia64.rpm > /tmp/sgi-mpt.cpio

Once you have created the .cpio file, you are free to install the software beneath any directory in which you have write permission. The following example demonstrates the process.

% cd /tmp
% cpio -idmv < sgi-mpt.cpio
opt/sgi/mpt/mpt-2.00/bin/mpirun
opt/sgi/mpt/mpt-2.00/include/mpi++.h
opt/sgi/mpt/mpt-2.00/include/mpi.h
            ...
opt/sgi/mpt/mpt-2.00/lib/libmpi++.so
opt/sgi/mpt/mpt-2.00/lib/libmpi.so
opt/sgi/mpt/mpt-2.00/lib/libxmpi.so
            ...
% ls -R /tmp/opt/sgi/mpt/mpt-2.00
bin  doc  include  lib  man

/tmp/opt/sgi/mpt/mpt-2.00/bin:
mpirun

/tmp/opt/sgi/mpt/mpt-2.00/include:
MPI.mod  mpi.h    mpi_ext.h   mpif.h             mpio.h    mpp
mpi++.h  mpi.mod  mpi_extf.h  mpif_parameters.h  mpiof.h

/tmp/opt/sgi/mpt/mpt-2.00/lib:
libmpi++.so*  libmpi.so*  libsma.so*  libxmpi.so*
           ...

If the MPT software is installed in an alternate location, set up an environment module to set environment variables which will be used by compilers, linkers, and runtime loaders to reference the MPT software.

Using Dynamic Shared Libraries to Run MPI Jobs

After you have installed the MPT RPM as default, use the following command to build an MPI-based application that uses the .so files:

For C programs, as follows:

% gcc simple1_mpi.c -lmpi
% mpirun -np 2 a.out

For Fortran programs:

% f77 -I/usr/include simple1_mpi.f -lmpi
% mpirun -np 2 a.out

The default locations for the include and .so files and the mpirun command are referenced automatically.

Assuming that the MPT package has been installed in an alternate location (under the /tmp directory), as described earlier in “Installing MPT Software in an Alternate Location”, the commands to compile, load, and check are, as follows:

% gcc -I /tmp/usr/include simple1_mpi.c -L/tmp/usr/lib -lmpi
% ldd a.out
libmpi.so => /usr/lib/libmpi.so (0x40019000)
libc.so.6 => /lib/libc.so.6 (0x402ac000)
libdl.so.2 => /lib/libdl.so.2 (0x4039a000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

As shown above, compiling with alternate-location libraries does not mean that your program will run with them. Note that libmpi.so is resolved to /usr/lib/libmpi.so, which is the default-location library. If you are going to use an alternate location for the .so files, it is important to set the LD_LIBRARY_PATH environment variable. If the site is using environment modules, this can be done in the mpt modulefile. Otherwise, the user must set LD_LIBRARY_PATH, as in the following example:

% setenv LD_LIBRARY_PATH /tmp/usr/lib
% ldd a.out
libmpi.so => /tmp/usr/lib/libmpi.so (0x40014000)
libc.so.6 => /lib/libc.so.6 (0x402ac000)
libdl.so.2 => /lib/libdl.so.2 (0x4039a000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

This example shows the library being resolved to the correct file.

Running MPI Jobs on a Cluster with MPT Alternate Installation

For MPI jobs to run correctly in a cluster environment in which MPT has been installed in an alternate location, you must copy all of the pertinent pieces of MPT to an NFS-mounted filesystem. This is the only way in which all of the nodes in the cluster can access the software, short of installing the same MPT RPM on each node. The following method is one way to accomplish this (assuming /data/nfs is an NFS-mounted directory and MPT has been installed in the alternate location /tmp/usr):

node1 # tar cf /tmp/mpt.2.00.tar /tmp/usr
node1 # cp /tmp/mpt.2.00.tar /data/nfs
node1 # cd /data/nfs
node1 # tar xf mpt.2.00.tar
node1 # setenv LD_LIBRARY_PATH /data/nfs/lib
node1 # /data/nfs/bin/mpirun -v -a <arrayname> host_A,host_B -np 1 a.out

Replace the <arrayname> in the above example with an array services array name that contains both host_A and host_B.

System Configuration

This section describes additional system configuration issues that a system administrator may need to address before running the SGI MPT software.

Starting Prerequisite Services

MPT requires that procset and array services be started and that the XPMEM kernel module be loaded. These tasks are performed automatically by a reboot of the system occurring after the system configuration tasks in this section have been performed. If a reboot has not been performed, the following commands should be executed by root:

modprobe xpmem
/etc/init.d/procset restart
/etc/init.d/arrayd restart

If you will be running MPT on a clustered system, these steps (or a reboot) must be performed for all hosts in the cluster.

Configuring Array Services

Array Services must be configured and running on all hosts in a cluster to perform the launch of MPI jobs. You can set up a simple Array Services configuration by executing the following two commands as root on all hosts of the cluster. List all host names on the arrayconfig command line:

/usr/sbin/arrayconfig -m host1 host2 ...
/etc/init.d/array restart

For a more elaborate configuration, consult the arrayconfig(1) and arrayd.conf(4) man pages and the "Installing and Configuring Array Services" section of the Linux Resource Administration Guide.

Adjusting File Descriptor Limits

On large hosts with hundreds of processors, MPI jobs require a large number of file descriptors. On these systems you might need to increase the system-wide limit on the number of open files. The default value for the file-limit resource is 1024. To change the default value for all users to 8192 file descriptors:

  1. Add the following line to /etc/pam.d/login:

    session    required     /lib/security/pam_limits.so

  2. Add the following lines to /etc/security/limits.conf:

    *     soft    nofile      8192
    *     hard    nofile      8192

The default 1024 file descriptors will allow for approximately 199 MPI processes per host. Increasing the value to 8192 allows for more than 512 MPI processes per host.

If other login methods are used (ssh, rlogin, and so on), and the increased file descriptor limits are desired, the corresponding files in /etc/pam.d should be modified as well.

Adjusting Locked Memory Limits

The OFED-based InfiniBand software stacks require the resource limit for locked memory to be set to a high value.

Increase the user hard limit by adding the following line to /etc/security/limits.conf:

 *     hard   memlock  unlimited

If you are running on a system with an SGI ProPack software release prior to SGI ProPack 5 Service Pack 1, you will also need to patch the Array Services startup script /etc/init.d/array to ensure that arrayd is running with a high "memlock" hard limit. This is done by the following sequence, executed as root:

sed -i.bak 's/ulimit -n/ulimit -l unlimited ; ulimit -n/' \
    /etc/init.d/array
/etc/init.d/array restart