The first benchmark illustrate a ``typical toy problem'', grid computation. The computation consists of solving the Poisson equation on a two dimensional grid using finite difference operators and a simple conjugate gradient method without preconditioning. Though this problem is very simple, it does illustrate important properties of the runtime system associated with an important class of computations.
In the program we have used an
early prototype of our DistributedGrid collection class. In addition
we have also used a common blocking technique that is often used
to increase the computational granularity. This involves placing a square sub-grid
on each processor. Given a problem of size and
processors with P<N,
there are two ways to do this. The most natural
is to define the grid as being of size N by N and mapping it to
a P by P processor array by a BLOCK BLOCK distribution through the template.
Unfortunately our compiler is not yet powerful enough to do the optimization
required to eliminate the element-to-element access functions on the
local sub-grids that will yield vectorizable code. An alternative is
to make the grid size P by P and set the grid elements
to be sub-grids of size M by M;
.
The collection element then takes the form
class GridBlock{
double f[M][M], u[M][M], p[M][M],
q[M][M], r[M][M], Ax[M][M];
double up_edge[M], lo_edge[M], le_edge[M], ri_edge[M];
....
};
where f is the initial data, u is the solution, and the remaining
variables are temporaries used in the conjugate gradient algorithm or
are, as in the case of the edge vectors, buffers used to store
boundary values from neighboring grid blocks.
The heart of the algorithm is given by the subroutine below.
(We have overloaded operators -, +, and * for arrays here to simplify
the text presented here. Our next version of the compiler will do this automatically.)
void conjugategradient(int num_iters, DistributedGrid<GridBlock> &G){
int i;
double gamma, alfa, tau, gamnew, beta;
G.Ap(); // q = A*p
G.r = G.f-G.q;
gamma = G.dotprod(R,R);
G.p = G.r;
for(i = 0; i < num_iters && gamma > 1.0e-12 ; i++){
G.Ap(); // q = A*p
tau = G.dotprod(P,Q); // tau = <p,q>
alfa = gamma/tau;
G.r = G.r - alfa*G.q;
gamnew = G.dotprod(R,R); // gamnew = <r,r>
beta = gamnew/gamma;
G.u = G.u + alfa*G.p;
G.p = G.r + beta*G.p;
gamma = gamnew;
}
}
Communication occurs only in the function Ap which applies the finite difference operator and in the dotprod function. In Ap the communication is all based on nearest neighbors and in the dotprod function the communication is all based on tree reductions.