Chapter 2. UPC Job Environment

The SGI UPC run-time environment depends on the SGI Message Passing Toolkit (MPT) MPI and SHMEM libraries and the job launch, parallel job control, memory mapping, and synchronization functionality they provide. UPC jobs are launched like MPT MPI or SHMEM jobs, using the mpirun(1) or mpiexec_mpt(1) commands. UPC thread numbers correspond to SHMEM PE numbers and MPI rank numbers for MPI_COMM_WORLD.

By default, UPC (MPI) jobs have UPC threads (MPI processes) pinned to successive logical CPUs within the system or cpuset in which the program is running. This is often optimal, but at times there is benefit in specifying a different mapping of UPC threads to logical CPUs. See the MPI job placement information in the mpi(1) man page under Using a CPU List and MPI_DSM_CPULIST, and see the omplace(1) man page for more information about placement of parallel MPI/UPC jobs.

UPC Quick Start on SGI Altix UV Systems

This section describes environment variable settings that may be appropriate for some common UPC program execution situations.

SGI UPC is designed with two options for performing references to non-local portions of shared arrays:

  • Processor driven shared memory

  • Global reference unit (GRU) driven shared memory

    The GRU is a remote direct memory access (RDMA) facility provided by the UV hub application-specific integrated circuit (ASIC).

By default, UPC uses processor-driven references for nearby sockets and GRU-driven references for more distant references. The threshold between "nearby" and "distant" can be tuned with the MPI_SHARED_NEIGHBORHOOD variable, described later in more detail in “UPC Runtime Library Environment Variables”.

Set the following environment variables:

  • Set MPI_GRU_CBS=0

    This makes all GRU resources available to UPC.

  • Some Altix UV systems have Intel processors with two hyper-threads per core, while others have a single hyper-thread per core. When dual hyper-threads per core are available, most HPC codes benefit by leaving one hyper-thread per core idle, thereby, giving more cache and functional unit resources to the active hyper-thread that will be assigned to one of the UPC threads. This is easy to do because the upper half of the logical CPUs (by number) are hyper-threads that are paired with the lower half of the logical CPUs. Set GRU_RESOURCE_FACTOR=2 when leaving half of the hyper-threads idle.

  • You can experiment with the MPI_SHARED_NEIGHBORHOOD=HOST variable. Some shared array access patterns will be faster using processor-driven references.

  • Set GRU_TLB_PRELOAD=100 to get the best GRU-based bandwidth for large block copies.

UPC Runtime Library Environment Variables

The UPC runtime library has a number of environment variables that can affect or tune run-time behavior. They are, as follows:

  • UPC_ALLOC_MAX

    This sets the per-thread maximum amount of memory in bytes that can be allocated dynamically by upc_alloc() and the other shared array allocation functions. Note that the SMA_SYMMETRIC_SIZE variable needs to be set to the sum of the value of UPC_ALLOC_MAX plus the amount of space consumed by statically allocated arrays in the UPC program. See the intro_shmem(1) man page for more information about SMA_SYMMETRIC_SIZE.

    The default is the amount of physical memory per logical CPU on the system.

  • UPC_HEAP_CHECK

    When set to 1, causes libupc to check the integrity of the shared memory heap from which shared arrays are allocated.

    The default value is 0.

A number of MPI and SHMEM environment variables described on the MPI(1), SHMEM(1) and gru_resource(3) man pages can be used to tune the execution of UPC programs on SGI Altix UV systems. These man pages should be consulted for a complete list of tunable environment variables. Some of the most helpful variables for UPC programs are, as follows:

  • MPI_SHARED_NEIGHBORHOOD

    This environment variable has an effect only on Altix UV systems. This variable can be set to HOST to request that UPC shared arrays use processor-driven shared memory transfers instead of GRU transfers. The size of the memory blocks being accessed in a remote part of a shared array and other factors can determine whether processor-driven or GRU-driven transfers will perform better.

    The default setting for the MPI_SHARED_NEIGHBORHOOD variable is BLADE, which implies that UPC threads will use processor-driven shared memory for references to shared array blocks that have affinity for the threads associated with sockets on the same UV hub.

  • MPI_GRU_CBS and MPI_GRU_DMA_CACHESIZE

    These environment variables have an effect only on Altix UV systems. These variables reserve Altix UV GRU resources for MPI and thereby makes them unavailable for UPC. Setting MPI_GRU_CBS to 0 will have the result of making all GRU resources available to UPC.

  • GRU_RESOURCE_FACTOR

    This environment variable has an effect only on Altix UV systems. This environment variable specifies an integer multiplier that increases the amount of per-thread GRU resources that can be used by a UPC program. If UPC programs are placed such that some portion of the logical CPUs (hyper-threads) on each UV hub are left idle, you can specify a corresponding multiplier. For example, if half of the logical CPUs are idle, a setting of GRU_RESOURCE_FACTOR=2 would be recommended. See the gru_resource(3) man page for more details.