One of the reasons for including Thread Environment Classes in pC++ is to provide a mechanism to encapsulate code that is designed for execution in a message-passing SPMD environment. This includes many of the libraries that have been designed at the national laboratories such as Lapack++, AMR++, and many more.
To understand how this works, consider an example of a matrix class Matrix defined as follows
TEClass Matrix{
double **data;
public:
int rows, cols;
Matrix(n,m,p);
void matMul(Matrix &A, Matrix &B);
double &operator ()(int,int);
};
In an SPMD style execution, the matrix object would be created on each processor participating in the computation and the constructor, given global dimensions m by n would automatically partition the data over the p processors. The interesting part of these libraries is the way processor communication is managed. In a typical application every processor must participate in each matrix operation done in parallel. All communication is hidden within the class operators and the resulting ``user code'' looks exactly like sequential code. (The version 1.0 pC++ compiler for distributed memory machines works exactly in this manner.) Take for example the way the library designer would implement the operator matMul(). Let us assume that the library is designed so that rows are partitioned over the processors. That is, rows (0, n/p -1) are on processor 0, rows (n/p, 2n/p-1) on processors 1, etc. The SPMD code for matMul() would look something like the following. Each processor has part of three matrices, A, B and *this. The code below first broadcasts a column of B to each thread which then assembles the pieces of the column and computes the appropriate dot product of that column with its share of the rows of A.
void Matrix::matMul(Matrix &A, Matrix &B){
int i,j,k,s, n, m, r, p, from;
p = NumProc(); //NumProc() gives the number of processor threads
k = A.rows; m = cols, n = rows/p; r = k/p;
double *rowbuf = new double[k];
double *buffer = new double[r];
for(i = 0; i < m; i++){
// boadcast a column of B to each processor
for(j = 0; j < p; j++){
for(s = 0; s < r; s++) buffer[s] = B.data[s][i];
pCxx_send(j, r, buffer);
}
// assemble column blocks into a B row
for(j = 0; j < p; j++){
pCxx_receive(&from, buffer);
for(s = 0; s < r; s++) rowbuf[from*r+s] = buffer[s];
}
for(j = 0; j < n; j++)
for(s = 0; s < k; s++)
data[i][j] += A.data[i][s]*rowbuf[s];
}
}
This version of the program is not optimal (a blocked version should
be used) but it is easy to understand and it is typical of the style
of SPMD libraries.
This function can now be called with a pC++ main program as follows
Processor_Main(){
Processors P;
Matrix C(n,m), A(n,k), B(k, m);
....
C.matMul(A, B);
};
A more interesting problem is that of the element reference operator. The job of the ( int, int ) operator is to make sure that any read or update from the main thread is propagated to the correct position in this distributed array. For example, if the main thread invokes
x = M(i,j);
then the thread that contains the
M(i,j) = x;
then it is the job of the (...) operator to make sure the
double dummy_buffer;
double &Matrix:: operator( int i, int j){
double *z;
int not_local = 1;
int p = NumProc();
if (MyProc()*(rows/p) <= i) && (i < MyProc()*(rows/p)){
// I have the desired row!
z = &(data[ i % p][j]);
not_local = 0;
}
else z = &dummy_buffer;
pCxx_BroadcastBytes(non_local, sizeof(double), z);
return *z;
}
If each of the worker threads associated with the TEClass executes
this operation, then the reference evaluation will be correct when
called by the main thread only if the main thread shares its address
space with one of the worker threads. (This is the case in the
current ver. 1.0 pC++), However, in the future versions, this may not
hold. There are two solutions to this problem. One solution is to
introduce the CC++ global data type qualifier, so that special
pointers and references can be created that can be passed between
address spaces. We are strongly considering this for ver. 2.0. The
other solution is to introduce more explicit member functions for read
and write operations.
TEClass Matrix{
double **data;
double &operator ()(int,int);
public:
int rows, cols;
Matrix(n,m,p);
void matMul(Matrix &A, Matrix &B);
double read(int i, int j){ return (*this)(i,j); }
void write(int i, int j, double value){ (*this)(i,j) = value; }
};
This solution protects the (...) operator to be only
used in the TEClass thread environments and it allows only
data values (instead of references or pointers) to be passed between
address spaces.