Next: Thread Reduction Functions. Up: The pC++ Programming Previous: ThreadsProcessors and

Thread Environment Classes. Ver. 2.0

Thread Environment Classes are not directly supported at the user level in Version 1.0, Because they will be supported in 2.0 and because they are an important part of the runtime model, we discuss them here.

Once one defines a set of processor threads, it is necessary to define an execution environment for each thread. In addition, one needs a mechanism to assign work to threads. This is accomplished with a Thread Environment Class which is a basic extension that pC++ makes to C++.

In its most basic form a Thread Environment Class defines a set of local variables for a processor object and a mechanism in which the main thread can start each thread executing a well defined unit of work.

A thread environment class looks like an ordinary C++ class. The syntax is identical to a standard C++ class, but with a different keyword.

   TEClass class_name: super_classes {

A regular C++ class defines a data type that consists of a structure of data field members and member/method functions that have special privilege to read and modify these data fields. A TEClass is identical except that it defines a structure of data fields that are private to each thread represented by a given Processor object. The invocation of member functions of a TEClass object by the main control thread transfers control to the individual threads for the duration of the execution of that function.

A TEClass is declared the same as any other class with the exception that

The invocation of a TEClass member function by the main control thread is the mechanism for transferring control from the main thread to the worker threads defined by a Processors class used to create the TEClass object. While a worker thread is in execution of a member function, it can only see the data members of that TEClass object instance and the program global variables that are visible from the scope of the TEClass definition. Figure 1 gives an illustration of the multithreaded runtime model that the user should have in mind when programming with TEClass objects.

To understand this control flow and these visibility rules, it is best to consider a simple example. The code below defines a simple thread environment class and the member functions that access it.

 int my_global;
 float a_global_array[3];

 TEClass C{
     int x, y[100];
      int i;
      void f(int j){ i = j; }

  Processors P;
  C myThreadEnv(P);


In this case the main thread defines a set of processor object threads called P. The thread environment object, myThreadEnv is next allocated. This object is actually a set of instances of objects defined by the class you would get if you replaced the key word TEClass by class. The constructor calls this with the processor object P, so there is one instance of an object which contains one copy of an integer variable x and one copy of an array y[100] for each processor object associated with P.

The constructor for this set of objects is executed by each processor object bound to it. While this is taking place, the main thread is suspended. In our example, there is only a default constructor, so control returns directly to the main thread.

When the main thread continues, it invokes myThreadEnv.f(2). Again, this is a signal for the processor-object/worker-thread to take over. Each thread executes myThreadEnv.f(2) applied local instance of the TEClass object that was allocate to it. Once the worker threads each start evaluating a member function of a TEClass object, they are free to call other member functions, or normal C functions. There are three important restrictions that need further investigation.

  1. A processor object, while executing a TEClass member function may read the variables declared in the global scope of the same file, in this case a_global and my_global_array. However, processor thread may NOT MODIFY these global values.

  2. There are some restrictions on I/O to and from the TEClass member functions.

  3. When a TEClass member function returns a value to the main thread it is up to the programmer to make sure that this is well defined.

We will discuss each of these restrictions and illustrate them with examples.

The first restriction is very important. At first sight, it may seem strange that a processor object thread cannot modify global variables. However, keep in mind that allowing them to do this can result in complex race conditions. That is, without some form of global synchronization mechanisms, there is no way to guarantee that one processor thread will modify a variable before another processor reads the variable. Consequently, one cannot easily predict the behavior of programs that ignore this rule.

A second, more pragmatic reason for this restriction is that pC++ is designed to be portable across distributed memory as well as shared memory systems. On distributed memory machines the global data is duplicated in each processors address space and the compiler generates SPMD style parallelism for these machines. If a processor threads were allowed to make arbitrary changes to global variables, there is no way for the compiler to make sure they maintain a consistent state across address spaces. (This is because it is not possible to completely solve the problems associated with aliases caused by pointers in C.) To get around this problem, the main thread is the only one that is allowed to modify these variables. Because, in an SPMD execution environment, the main thread is duplicated across process, it is a much easier task to make sure that the main thread is identical on each processor.

To see this restriction in more detail consider these cases

int a, b;

int foo(){a++; }

int bar(int *p){ *p = 0; }

TEClass C{
        int x;
        void f(int *z){ 
           foo();  //<<< error: modifies global state
           bar(z); // may be o.k.

    Processors P;
    C T(P);
    foo();     // o.k.
    T.f(&a);   //<<< error: because bar will modify a
    T.f(&T.x); // o.k.  

In this case the function foo() can be called from the main program, but not from a TEClass member function because it modifies a global variable. However, the call to T.f() is problematic. In one case, it is an error because it calls bar() with the address of a global variable and in the other case it operates on a member field of the TEClass instance hence the call T.f(&T.x) is valid.

Perhaps the most perplexing rule associated with TEClass objects has to do with values returned to the main control thread. When a TEClass method is invoked by a processor thread, it behaves exactly like any other class member function. However, when a TEClass member function is invoked by the main thread the action is different. Each processor object thread executes the function and each will return a value. Unfortunately, the calling thread is only expecting a single value to be returned. To avoid non-determinism in the program the programmer is required to make sure these values are all identical.

To illustrate this consider the following simple example. We will use two special pC++ intrinsics. MyProc() is a function that may be called by a processor object thread to determine its ``thread identifier'' which is a value between 0 and p-1 where p is the number of processor threads that are allocated in a processors object. (In version 1.0 of pC++, this corresponds to the processor number that the thread is assigned to.)

The second function pCxx_max() is a pC++ intrinsic function that selects the maximum of a set of values from each of the processor threads. (This function is a member of a larger family of special TEClass reduction functions that will be described in greater detail below.)

The example illustrates both the correct and incorrect use of the TEClass member function return mechanism.

TEClass C{
     int f(){ return MyProc(); }
     int g(){ 
        int y = f();  //<<< call to f o.k.
        return pCxx_max(y);

  Processors P;
  C MyThreads(P);
  int x = MyThreads.f();  //<<< error.  multiple return values for f.
  int max_thread_id = MyThreads.g();  //<<< o.k.

In this example, the function f() returns a different value for each processor object thread. Consequently, it does not make sense to call it from the main thread. However, its meaning as a scalar valued function is well defined within the individual thread environments: it simply returns the identifier of that thread. On the other hand, the function g() uses the pCxx_max() reduction to select the largest value returned by each of the calls to f() as a return value, so each thread object returns the same value.

Note that there are several alternative designs that pC++ could have used that would avoid this ``single value return value'' restriction. First, one could stipulate that the main thread would ``pick'' a value at random from the set of values generated on each thread. However, experience has shown us that users rarely want this feature. In addition, this scheme is non-trivial to implement. Alternatively, one could devise a mechanism that would allow sets of values to be returned and assigned to scalar variables just like a scalar return value. Unfortunately, this would violate the C++ type system. However, pC++ does provide a way to create and manipulate aggregate values and this topic is discussed in the chapter on Collections.

Next: Thread Reduction Functions. Up: The pC++ Programming Previous: ThreadsProcessors and
Mon Nov 21 09:49:54 EST 1994