Next: Working with Intel Up: Programming with Collections. Previous: Interfacing HPF and

Working with Connection Machines node Fortran on the CM-5

In this section we consider case b in which a Fortran node program is called from a parallel section of pC++. One of the problems in experimenting with HPF compilers is that only a few are now beginning to emerge. At the time of this writing, we do not have access to any of these systems, so we have conducted our experiments with Thinking Machines Co. CM Fortran (CMF) on the CM-5. CMF is a reasonable approximation to the spirit of HPF.

There are three major issues that we want to address:

As it can be seen later on in this section, the first two are addressed through new C++ classes, and the third through a new pC++ collection.

A CM-5 processing node consists of a Sparc microprocessor and four vector processor accelerators (vector units). To make use of these vector units, arrays have to be allocated in the memories of the vector units. The Connection Machine Run Time System (CMRTS) provides functionalities for allocating space on the vector unit memories. However, the allocation is not so trivial since one also needs to be concerned about how array elements will be partitioned among the four vector units and being compatible with CMF. We use CMRTS function ``CMRT_intern_detailed_geometry'' to create a geometry and ``CMRT_allocate_heap_array'' to allocate memory. Once the arrays are allocated, array descriptors returned by ``CMRT_allocate_heap_array'' are passed to Fortran subroutines as arguments. Computations are done in Fortran subroutines. These Fortran arrays can also be directly manipulated by pC++ control programs through overloaded C++ operators.

We design a special C++ array class for each Fortran data type. The following class defines an array class for Fortran double precision arrays:

class FArrayDouble {
  // private part of class
  double *d;
  FArrayDouble() {d = 0.0;};
  FArrayDouble(int i);
  FArrayDouble(int i,int j);
  FArrayDouble(int i,int j,int k);
  FArrayDouble(int i,char layout1);
  FArrayDouble(int i,int j,char layout1, 
   char layout2); 
  FArrayDouble(int i,int j,int k,
   char layout1,char layout2,char layout3);
  double& operator()(int l);
  double& operator()(int l,int n);
  double& operator()(int l,int n,int m);
The private part of the class is used to store memory addresses of an array on the four vector units. The information will be used for accessing each individual array elements in pC++, and, in general, should not be of concern to the users. The class constructors that require arguments allocate memory on vector units when invoked. The arguments, i, j, and k specify the dimension sizes of the array and layout1, layout2, and layout3 correspond to the CMF compiler directive for array layouts. A layout can be either ``SERIAL'' or ``NEWS.'' A ``NEWS'' dimension means that this dimension will be distributed across the four vector units. A ``SERIAL'' dimension means this dimension will be packed into one vector unit's memory. An array with only ``SERIAL'' dimensions signals that it should be allocated in the Sparc chip's memory and thus is not allowed in the current implementation. Such an array can be allocated just like a normal C++ array which is always allocated in the Sparc chip's memory. The constructors that do not require the layout information use the default ``NEWS'' layout. The overloaded operators are used for accessing array elements.

The method we use to access array elements on the vector units from the Sparc chip differs from that of CMF. In the CMF assembly code, array element access involves computing ``send addresses.'' In our approach, we use ``subgrid_dimension,'' ``offchip_position,'' ``subgrid_axis_increment,'' and ``subgrid_size'' provided by CMRTS to access each individual array element. It turns out that our method is as fast as CMF when all axes are declared ``NEWS'' and can be a few times faster than CMF when some axes are declared ``SERIAL.'' For example, the speed for accessing a double precision array is bytes/second in pC++ and bytes/second in CMF when the first dimension is declared ``NEWS'' and the second ``SERIAL.''

Besides memory allocation, we also provide methods for inter-element communication for the Fortran arrays. We design a special pC++ collection for this purpose:

Collection Fortran : public SuperKernel {
  Fortran(Distribution *D,Align *A);
  void GetFArray(int index,int offset,
                 FArrayDouble &buffer);
  void GetFArray(int index1,int index2,
      int offset,FArrayDouble &buffer);
The Fortran collection is derived from the special root collection SuperKernel. The communication function GetFArray fetches an FArray array from a collection element denoted by the given index(es).

A Fortran subroutines is declared as a method of an element class. An invocation of a Fortran subroutine declared this way results in separate invocations of the Fortran subroutine on each processor. The Fortran subroutine is thus similar to a message-passing CMF ``node program'' though explicit message-passing is strongly discouraged in the Fortran subroutine. Communication between collection elements is accomplished in the pC++ control program. Since Fortran compilers usually add under scores to subroutine names, the pC++ compiler will need to generate a ``wrapper function'' for each Fortran subroutine.

By way of illustration, the following shows how a Fortran subroutine is called from pC++ utilizing the Fortran interface.

extern "C" {
 //Fortran subroutine is declared external 
 void integrate_seg_(double*,double*,

class Segment {
  double seg_sum;
  int length;
  FArrayDouble x, y;
  void integrate_seg();

Segment::Segment():x(length),y(length) {
  //FArrays are allocated in constructor 

void Segment::integrate_seg() {
  // wrapper invokes Fortran subroutine

void main() {
  Processors P;
  Distribution D(64,&P,CYCLIC);
  Align A(64,"[ALIGN(T[i],D[i])]");
  Fortran<Segment> F(&D,&A);
The Fortran subroutine is given below:

      subroutine integrate_seg(x,y,
     &   length,seg_sum)
      double precision seg_sum
      integer length
      double precision,array(length)::x,y
cmf   layout x(:news), y(:news)
      seg_sum = SUM(x*y)
In this example, the Fortran subroutine computes the dot-product between two one-dimensional arrays x and y. The x and y arrays are passed to the Fortran subroutine as ``explicit-shape'' arrays. However, the interface does not prevent us from declaring the arrays as ``assumed-shape'' arrays.

One shortcoming of the current Fortran interface implementation is that the pC++ control program cannot access Fortran COMMON blocks. This requires that all Fortran global variables be declared in the pC++ control program and passed to Fortran subroutines as arguments. Nevertheless, it does allow communication between Fortran subroutines through COMMON blocks. It is important to note, however, that pC++ programming model allows more than one pC++ collection element to be allocated on one processor, meaning Fortran subroutines called by different collection elements will access the same COMMON block. Data in the COMMON block are shared by elements on that processor. Potential race-condition may arise if a programmer is not careful. The best way to solve this problem is to allocate only one collection element on one processor.

At the time of this writing, the pC++ compiler cannot generate the wrapper for the Fortran subroutine as well as the extern statement, so a user has to hand code them. In the future, a user would write something like

extern "HPF" void Segment::integrate_seg(FArrayDouble &x,FArrayDouble &y,
                                         int &length,double &seg_sum);
to indicate that integrate_segis a Fortran subroutine and the compiler will generate the necessary code.

Next: Working with Intel Up: Programming with Collections. Previous: Interfacing HPF and
Mon Nov 21 09:49:54 EST 1994