Next: Accessing a Remote Up: Communication between Collection Previous: More on ThisCollection.

Working with the Local Collection. Ver. 1.0+

Each Processors object thread owns one TEClass representative object from the collection definition. The non-MethodOfElement member function are executed by the thread in pure MIMD style. Each such function has complete control of the "local collection" of elements that are mapped to that thread of computation. It is often the case that the easiest way to carry out some type of processing on the collection is to have each thread carry out the task sequentially on each element of its local collection.

The fourth way to program the summation of the elements in a collection is to have each processor object thread compute the total of the elements in their own local collection and then we will copy the result to a second collection defined as a distributed array of elements with only one element per thread.


   Processors P;
   Distribution d(pcxx_TotalNodes(),&P,BLOCK);
   Align a(pcxx_TotalNodes(),"[ALIGN(V[i],T[i])]");
   DistributedArray< E > onePerThread(&d,&a);

One of the functions of a distributed array is to provide basic array oriented reduction function. In this case we will use the function


     void ReduceDim1();
which reduces along the first dimension so that the sums of rows of an array are reduced to column 0. In our case, because the array is one dimensional, the sum will be left in element 0.

We begin with a rewrite of the reduce2() to take a pointer to the new distributed array as an argument. We will have each thread add every element of the local collection to the total. The easiest way to say this is to use the function Is_Local() which is true if the named element is in the local collection of the thread that is executing the function. A simple loop can be used to sweep though the collection as follows.


    for(int i = 0; i < dim1size; i++)
       if( Is_Local(i))  local_total += (*this)(i);
The problem with this approach is that one must go through the entire collection to identify only the local subset. In some cases the overhead for this search is not serious. However, there is another way to do this. We can make use of the function accumulate() as described in for the previous example to calculate the local_total for every processor. Then we will assign them to the corresponding entries in the array and use the ReduceDim1() function.



ElementType D::reduce3(DistributedArray<E> *onePerThread){
     accumulate();
	
    // now copy the total to the collection onePerThread and Reduce
    // that one.
    pcxx_Barrier(); //make sure all local sums are complete
    for(i = 0; i <  onePerThread->dim1size; i++) 	
	if(onePerThread->Is_Local(i)) 
  	  *((*onePerThread)(i)) = local_total;
    pcxx_Barrier(); // make sure all the copies are complete.
    onePerThread->ReduceDim1();
    return *(*onePerThread(0));
}

We will include one more example of using the local collection. Suppose you wish to broadcast one value stored in a member field of one element to the same location in all the other elements of a collection. By adding the function BroadcastToElements() to our collection thread functions as shown below only need to supply the index of the element that has the data to be broadcast, the offset in bytes from the start of the element and the size of the data block.

   
Collection D: SuperKernel{
   public:
      ElementType local_total;
      D(Distribution *d, Align *A);
     void BroadcastToElements(int element, int offset, int size);
   MethodOfElement:
     ...
};
To implement this we build a buffer in each thread and first find the thread that has the source element. Next we copy the source data to the buffer and share the contents of the buffer with the other elements by using the pCxx_BroadcastBytes() routine. Once each thread has a copy, it can load the data into the appropriate filed of each element.

void D::BroadcastToElements(int element, int offset, int size){
      char *buffer = new char[size];
      int flag = 1;
      int i, j;
      char *p;
      if(Is_Local(element)){
        flag = 0;  
        p = ((char *) (*this)(element)) + offset;
        for(j = 0; j < size; j++) buffer[j] = p[j];
      }
      pCxx_BroadcastBytes(flag, size, buffer);
      for(i = 0; i < dim1size; i++)
           if(Is_Local(i)){
           p = ((char *) (*this)(i)) + offset;
           for(j = 0; j < size; j++) p[j] = buffer[j];
      }
      delete [] buffer;
}
Note that the use of an offset into an element is not a standard C++ way of writing code. A more elegant approach would be to use a pointer to a member. However, the current pC++ preprocessor has a problem with this construct. It remains to clasulate the offset; in order to so that substract the address of the class from the address of the class member to be broadcast (that eill give the offset within the class). In this case both addresses before substraction should be cast to chars (to obtain offset in bytes). size can be calculated by taking b In the next section we discuss making accessing sub-blocks of element in more detail.



Next: Accessing a Remote Up: Communication between Collection Previous: More on ThisCollection.


beckman@cica.indiana.edu
Mon Nov 21 09:49:54 EST 1994