Chapter 1. Introduction

The UPC Language Specifications document defines Unified Parallel C (UPC) as a parallel extension to the C programming language standard that follows the partitioned global address space programming model. It is available at the following location: http://upc.gwu.edu/docs/upc_specs_1.2.pdf.

UPC: Distributed Shared Memory Programming provides information about UPC programming language. For details about this manual and other resources related to UPC, see the preface “About This Manual”.

The UPC common global address space (SMP and NUMA) provides an application with a single shared, partitioned address space, where variables may be directly read and written by any processor, but each variable is physically associated with a single module load sgi-upc-develprocessor.

This manual documents the SGI implementation of the UPC standard.

UPC Implementation

The SGI implementation of UPC conforms to Version 1.2 standard. Parallel I/O, which is not yet a part of the language, is not supported.

The SGI Unified Parallel C (UPC) compiler man page describes the sgiupc(1) command. sgiupc is the front-end to the SGI UPC compiler suite. It handles all stages of the UPC compilation process: UPC language preprocessing, UPC-to-C translation, back-end C compilation, and linking with UPC runtime libraries.

To see the sgiupc(1) man page, make sure the sgi-upc-devel module is loaded, as follows:

% module load sgi-upc-devel

Compiling and Executing a Sample UPC Program

A sample UPC program (hello.c) is, as follows:

#include <upc.h>
#include <stdio.h>
int
main ()
{
  printf("Executing on thread %d of %d threads\n", MYTHREAD, THREADS);
}

To compile this program and generate the executable hello, use the following command:
# sgiupc hello.c -o hello

The mpirun(1) command is used for execution. If you want the program to execute using four threads, perform the following command:

# mpirun -np 4 hello

You can expect output similar to the following:

Executing on thread 1 of 4 threads
Executing on thread 3 of 4 threads
Executing on thread 0 of 4 threads
Executing on thread 2 of 4 threads


Note: The statements might not appear in the order listed in the output example, above.


For more information on sgiupc(1) and mpirun(1), see the corresponding man pages.

Mixing of UPC Programs with Other Languages

The rules for mixing UPC programs with programs written in other languages are similar to that of mixing a C program compiled with the native compiler used to compile the UPC program (as specified by UPC_NATIVE_CC), with the caveat for shared pointers, as follows:

If the main program is compiled using sgiupc, the appropriate libraries needed for running UPC programs are linked in. If the main program is not a UPC program compiled with sgiupc, the appropriate runtime libraries needed by sgiupc have to be explicitly linked in. You can determine this by specifying the -v option to the sgiupc command used to compile and link an application comprising of a single UPC program.

Shared Pointer Representation and Access

In order to handle large thread counts, as well as large blocking size, the SGI UPC compiler uses a struct type to represent a shared pointer. As SGI reserves the right to change this representation at a later time, it would be best to use UPC provided functions to access the individual components if a shared pointer is to be passed to a non-UPC function.

Vectorization of Loops to Reduce Remote Communication Overhead

Consider the following loop:

upc_forall (i = 0; i < N; i++; i)
  a[i] = b[i] + c[i];

If the array references are all remote, there are 2*N remote loads and N stores performed in this loop.

If the loop does not have any aliasing issues, the number of remote loads can be reduced to 2 and the stores to 1, although each of these would be dealing with N elements at a time. This will cut down the communication overheads to fetch remote data.

If a, b, and c are shared restricted pointers, the compiler is able to figure out that there are no aliasing issues, and it is able to vectorize this loop so that remote block data accesses can be used.

For all other cases, the user can specify a pragma type before the loop, as follows:

#pragma sgi_upc vector=on
upc_forall (i = 0; i < N; i++; i)
 a[i] = b[i] + c[i];

Note that the upc_forall can contain several statements.