GPULib API

Tech-X Corporation

single page | use frames    directories     project statistics

Overview

Overview

The gpulib IDL package is a part of Tech-X Corporations GPULib library. This library provides array operations that are accelerated using NVIDIA's CUDA technologies if appropriate hardware is installed.

The IDL bindings for the library are designed to enable developers to write software that executes both on systems with a CUDA enabled GPU or without.

Developers have to be aware of the different memories used by the CPU and the GPU. Data has to be explicitly transferred to and from the GPU memory, which can be a time consuming operation. The IDL bindings for the GPUlib can hide these explicit data transfer calls for convenience, but one should be aware that this can severely impact the overall performance.

For installation instructions, see the README file in the directory above.

Initialization

GPUlib is initialized by a call to GPUINIT. It detects whether a GPU or a GPU emulator is available. If none is available, all the GPU operations will be emulated in IDL. The detected result can be overridden by specifying keyword parameters.

After running GPUINIT, you will have the system variable !GPU available. This struct contains both the mode (1 = using a physical GPU, 0 = using pure IDL emulation, -1 = using device emulator), so if you want to run on one of the GPUs in your system, you have to make sure that: IDL> print, !GPU.mode prints 1.

If you cannot get GPUINIT to recognize your GPU, make sure that your IDL_DLM_PATH points to the location of the gpulib.dlm file.

Data transfer

Users can explicitly transfer data between GPU and host memory by using GPUPUTARR and GPUGETARR. GPU variables can either be allocated explicitly or they are allocated on the fly during transfer, e.g. if x_gpu is undefined and the user issues a: gpuPutArr, findgen(100), x_gpu a correctly sized x_gpu variable will be allocated and will be available for later use. It is the user's responsibility to clean up the allocated space via gpuFree, x_gpu.

Vector Operations

In order to get significant speedup using the GPUlib, one should perform multiple operations on long vectors and only rarely transfer data between the GPU and CPU.

Vector operations on the GPU are of the form: gpuAdd, x_gpu, y_gpu, result_gpu which adds the elements of the vector under X and Y and stores in the result. The above form assumes that all variables are already on the GPU.

If result_gpu is an undefined variable at the time of the call, it will be allocated on the GPU and the handle will be returned in result_gpu.

If x_gpu or y_gpu are not gpu variables, then they will be transferred to a temporary variable on the GPU.

For example: gpuAdd, findgen(10), findgen(10)+5, result_gpu will compute the sum of findgen(10) and findgen(10)+5 on the GPU and store the result in the result_gpu variable.

Array subscripting

gpulib provides mechanisms to subscript vectors and arrays on the GPU via the GPUSUBARR procedure. Depending on the dimensionality of the object, it either takes one or indices, which can either be scalars or 2 element arrays representing the lower and upper bound of a segment. An index or upper bound of -1 corresponds to IDL's '*'

For example, the following operation: gpuSubArr, a, [3, 5], -1, b, -1, [3, 5] is equivalent to: b[*, 3:5] = a[3:5, *] (assuming that the first dimension of b has the same number of elements as the second dimension of a).

Functional Form

In addition to the procedural form described here, GPULib also provides a functional form of most operations. So instead of writing: gpuAdd, x_gpu, y_gpu, z_gpu one can also write, possibly more intuitively: z_gpu = gpuAdd(x_gpu, y_gpu) The main disadvantage of the functional form is that in the above form, GPULib needs to allocate temporary storage on the GPU, as the location of the result is not know at the time of invocation of GPUADD. This can be a time consuming step.

In order to avoid this extra allocation you can provide the left-hand side as a keyword argument: z_gpu = gpuAdd(x_gpu, y_gpu, LHS=z_gpu)

New operator overloaded API for IDL 8.0

IDL 8.0 introduced operator overloading to IDL objects. While the old GPULib API still works, IDL 8.0 users can make use of the new API with operator overloading.

The gpulib_new_api.pro example in the demos/new_api directory shows some simple examples of how to use operator overloading with GPULib.

First, we initialize GPULib with GPUINIT. This is required before calling any GPULib routines. Arguments to use a particular graphics card (if multiple are present) or emulation mode (hardware, software, or pure IDL) are available.

IDL> gpuinit

Next, we create two simple arrays on the GPU.

IDL> x = gpuFindgen(10) IDL> y = gpuFindgen(10)

Instead of calling GPUADD to add these arrays, the + operator can now be used. Most common arithmetic operators are available now: +, -, *, /, ^, #, ##, <, >, gt, ge, lt, le, ne, eq, and mod: IDL> z = x + y The HELP routine gives help on GPULib variables now as well: IDL> help, z Z GPUFLOAT = Array[10] The PRINT routine will print a GPULib variable, handy for debugging: IDL> print, z 0.00000 2.00000 4.00000 6.00000 8.00000 10.0000 12.0000 14.0000 16.0000 18.0000 The SIZE routine will give the same output as for a regular IDL array: IDL> print, size(z) 1 10 11 10 A FOREACH loop can be used to create a view into each row of a 2-dimensional GPU array or each element of a 1-dimensional GPU array: IDL> arr = gpuFindgen(5, 5) IDL> foreach row, arr, r do print, r, row 0 0.00000 1.00000 2.00000 3.00000 4.00000 1 5.00000 6.00000 7.00000 8.00000 9.00000 2 10.0000 11.0000 12.0000 13.0000 14.0000 3 15.0000 16.0000 17.0000 18.0000 19.0000 4 20.0000 21.0000 22.0000 23.0000 24.0000 Use brackets like regular IDL arrays: IDL> print, arr[*, 1] 5.00000 6.00000 7.00000 8.00000 9.00000 Brackets can be used to assign to GPU arrays also (where the right hand side of the assignment is either a regular IDL array or another GPU variable).

IDL> arr[*, 1] = findgen(5) IDL> print, arr 0.00000 1.00000 2.00000 3.00000 4.00000 0.00000 1.00000 2.00000 3.00000 4.00000 10.0000 11.0000 12.0000 13.0000 14.0000 15.0000 16.0000 17.0000 18.0000 19.0000 20.0000 21.0000 22.0000 23.0000 24.0000

Variables can be released with GPUFREE, but automatic garbage collection in IDL 8.0 will also take care of them: IDL> gpuFree, [x, y, z, arr]

More information

For more information about the GPULib IDL interface, see: docs/gpuinit.html or contact support@txcorp.com.

Directories

./
demos/async/
demos/bench/
demos/bwtest/
demos/decon_hubble/
demos/fdtd/
demos/new_api/
demos/sam/
demos/swirl/

demos/transform3d/
lib/
unit/

Project statistics

Directories: 12
.pro files: 145
.sav files: 2
Routines: 538
Lines: 25,785
Required IDL version: 6.2