Next: Message Passing Paradigm Up: Parallel Programming Previous: Parallel Programming

The simplest parallel programming  paradigm is data  parallelism.

If you have, say, two one-dimensional arrays A and B, of equal size, say, N double precision floating point numbers each, the traditional way to add such arrays in the C-language  would be

for (i = 0; i < n; i++) c[i] = a[i] + b[i];

You'd do it similarly in Fortran-77 or Pascal or any other sequential programming language. But in Fortran-90 you can simply state:
C = A + B

which corresponds closely to how you would write it in mathematics:

C = A + B

The operation of adding two arrays is explicitly parallel, i.e., you can perform each addition c[i] = a[i] + b[i] independently. If you run a Fortran 90  compiler on a Cray X1 or NEC SX6, the compiler will automatically convert program lines such as C = A + B into parallel operations that will execute c[i] = a[i] + b[i] simultaneously (or almost simultaneously - depending on the architecture of the machine) for all values of i.

A lot of scientific and engineering computing can be expressed efficiently in this data-parallel paradigm. All field problems like electrodynamics, fluid dynamics and chromodynamics fit in this category. Weather and climate modeling codes do too. Gene matching and searching codes can be expressed in terms of data parallelism as well. Basically anything that operates on very large arrays of data submits to data parallelization easily.

Data parallel languages provide various constructs for conditional differentiation of some operations on various parts of arrays, e.g., masking, and for shifting the content of the arrays in a way that is similar to shifting the content of the register, i.e., you can shift arrays left and right, you can rotate them, or shift with padding.

Data parallel programs are very easy to understand and to debug because their logic is essentially sequential. But it is also this sequential logic that results in some inefficiencies. Suppose you want to set values in the B array to zero for all such is for which c[i] == 1. The Fortran-90 statement to do this is

where (c .eq. 1)
b = 0
end where

If c[i] == 1 for just a handful of is, this operation will be performed on only a handful of processors that are responsible for the b[i] involved, whereas all the other processors will idle for the duration of this operation. If the operation is very involved, e.g., there may be some very complex and long computation going on within the confines of the where statement, the idling processors may end up doing nothing for a long time.

Nevertheless, there is a lot to be said in favor of data parallelism. It is much more universal than many computer scientists are prepared to admit, and because it is so easy to use, it should be provided to supercomputer users, especially the ones writing their own codes. Data parallelism is very highly scalable too. We will probably see the return of data parallel computing in context of petaflops systems. The PIM  (Processor  in Memory) architecture is very similar to the Connection Machine  (see below).

Data parallel programs run best on vectors and massively parallel machines, like the the Connection Machine. They can run, although not as efficiently, on SMPs and clusters too, but then, not much will run on clusters efficiently anyway. There is a special variant of data parallel Fortran for clusters, called High  Performance  Fortran  (HPF). The best HPF compilers can be bought from  The Portland Group Compiler Technology.

For more information on HPF see, e.g., the High Performance Fortran page at the Rice University. Also see a brief tutorial about HPF, which was included in the P573 course.

Next: Message Passing Paradigm Up: Parallel Programming Previous: Parallel Programming
Zdzislaw Meglicki
2004-04-29