Next: Exercises Up: Basic MPI Previous: Exercises

## Dividing the Pie

We are now going to analyze the program you should have run already, cpi. The source of this program is distributed with MPICH2, and it lives in the directory /N/hpc/mpich2/src/mpich2-0.94b1/examples. The file name is cpi.c. For your convenience and because it is an open software program, I also quote it below:
gustav@bh1 $pwd /N/hpc/mpich2/src/mpich2-0.94b1/examples gustav@bh1$ cat cpi.c
#include "mpi.h"
#include <stdio.h>
#include <math.h>

double f(double);

double f(double a)
{
return (4.0 / (1.0 + a*a));
}

int main(int argc,char *argv[])
{
int done = 0, n, myid, numprocs, i;
double PI25DT = 3.141592653589793238462643;
double mypi, pi, h, sum, x;
double startwtime = 0.0, endwtime;
int  namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];

MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
MPI_Get_processor_name(processor_name,&namelen);

fprintf(stdout,"Process %d of %d is on %s\n",
myid, numprocs, processor_name);
fflush(stdout);

n = 10000;                  /* default # of rectangles */
if (myid == 0)
startwtime = MPI_Wtime();

MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);

h   = 1.0 / (double) n;
sum = 0.0;
/* A slightly better approach starts from large i and works back */
for (i = myid + 1; i <= n; i += numprocs)
{
x = h * ((double)i - 0.5);
sum += f(x);
}
mypi = h * sum;

MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);

if (myid == 0) {
endwtime = MPI_Wtime();
printf("pi is approximately %.16f, Error is %.16f\n",
pi, fabs(pi - PI25DT));
printf("wall clock time = %f\n", endwtime-startwtime);
fflush(stdout);
}

MPI_Finalize();
return 0;
}
gustav@bh1 $ This program calculates by using the formula  (5.1) The calculation is numerical, i.e., the segment [0,1] is divided into 10,000 equal fragments. Within each fragment we evaluate for the middle of the fragment (this is the height of the rectangle) and multiply it by the width of the fragment to obtain the area of the rectangle. Then we add all the areas together and this is our integral, that is also the approximate value of . The program begins with the usual incantations to MPI_Init, MPI_Comm_size, MPI_Comm_rank and MPI_Get_processor_name. Then each process writes on standard output its rank number (which is called myid here), the size of the communicator (which is called numprocs here) and the name of the processor it runs on. We have not talked about it so far, but this writing on standard output is far from trivial. Every process writes on its own standard output. In some early versions of MPI, e.g., in LAM MPI, these writes were simply lost. But MPICH2 engine collects all standard outputs from its processes and combines them together on the so called console node. This is always the node whose rank is zero, the node on which we run mpdboot. The output you see on the PBS output file is the output from the MPICH2 console node. Now we hardwire the number of division of segment [0,1] into n and the console node begins the timing of the program by calling function MPI_Wtime. This function measures wall-clock time down to a microsecond, sic! It is very accurate indeed. Now the value of n is broadcast to all processes. This is completely unnecessary in this particular program, because all nodes know what n is from the very beginning, but this program has been modified from its interactive predecessor, which used to read n from standard input. It was the console node that used to do the reading. Now every process calculates the width of the rectangles: h = 1.0 / (double) n;  Every process initializes its own local variable sum to zero, and then calls on the rectangle that corresponds to its own rank number. Then it jumps to the rectangle that corresponds to its rank number plus the number of processes in the pool, and so on. The values of are added for all rectangles and only at the end they are multiplied by the width of the rectangle. This, of course, saves on unnecessary multiplications. Now we encounter a new operation, MPI_Reduce. This operation takes as its input the content of mypi for each process in the pool (the first argument). We tell it that there is one item (the third argument) of type MPI_DOUBLE (the fourth argument) in there. We are then telling it that it should perform summation over all instances of mypi (the fifth argument, MPI_SUM) and that it should write the result of the operation on pi (the second argument) on the root process (the sixth argument), which is this case is the console process, i.e., process of rank zero. The whole operation is performed within the MPI_COMM_WORLD communicator. At the end of this operation only the root (console) process has the right answer in its location that corresponds to pi. But this is fine. In the next clause we instruct the console process to measure the wall clock time again, then print the value of on standard output together with the estimate of the accuracy of the computation. Then the console process also writes how long it took to carry out the computation. Here is the output of the program. gustav@bh1$ pwd
/N/B/gustav/PBS
gustav@bh1 $cat mpi_out Local MPD console on bc89 Process 0 of 8 is on bc89 Process 1 of 8 is on bc40 Process 2 of 8 is on bc42 Process 3 of 8 is on bc41 Process 4 of 8 is on bc44 Process 5 of 8 is on bc43 Process 6 of 8 is on bc88 Process 7 of 8 is on bc46 pi is approximately 3.1415926544231247, Error is 0.0000000008333316 wall clock time = 0.009541 gustav@bh1$


MPI_SUM is not the only operation you can use with MPI_Reduce. MPI provides a number of predefined reduction operations that can be used in this context, and you can define your own operations too. The predefined ones are:

 MPI_MAX MPI_MIN MPI_SUM MPI_PROD MPI_LAND MPI_BAND MPI_LOR MPI_BOR MPI_LXOR MPI_BXOR MPI_MAXLOC MPI_MINLOC

Next: Exercises Up: Basic MPI Previous: Exercises
Zdzislaw Meglicki
2004-04-29