next up previous
Next: Double Loops Up: Some Loop Examples Previous: Some Loop Examples

Single Loops

Consider, for example, the following class SimpleLoop1 in which an instance method instanceMethod() is used to assign the inverse of the first N elements in a floating-point array b to corresponding elements in another floating-point array a:

tabular511

Obviously, because no data dependences are carried by loop L1, the loop can be executed in a DOALL-like manner. Moreover, because the parameter N is only used as upper bound, the compiler does not have to deal with any variables that are declared outside the loop, but that are refered to within the loop-body. Exploiting implicit loop parallelism proceeds as follows.

First, the following instance method run_1() is constructed and added to the class SimpleLoop1 (see section 2.3.1):

tabular520

Thereafter, the restructuring compiler constructs the following class LoopWorker_1 and adds this class to the transformed Java program (see section 2.3.2):

tabular526

Finally, loop L1 in instanceMethod() is replaced by the following construct, where NUM=4 and SVARS=0. Because the same amount of work is done in each iteration, block-scheduling has been selected (see section 2.3.3, where adding `implements Schedules' to the transformed class enables the compiler to use the symbolic representation of a scheduling policy):

tabular537

   figure1376
Figure 7: Single Loop in Instance Method

Now, suppose that a similar loop that operates on class variables appears in a class method classMethod() of a class SimpleLoop2:

tabular549

In this case, a class method run_2() similar to the one shown above (but with a qualifier static) is added to the class SimpleLoop2, and the whole loop is replaced by a construct that is similar to the construct shown above (but without an argument this in the constructor invocation of LoopWorker_2). The loop-worker for the parallel loop L2 merely consists of a run()-method:

tabular560

   figure1388
Figure 8: Single Loop in Class Method

In figures 7 and 8, we show the serial execution time tex2html_wrap_inline2430 and the parallel execution time tex2html_wrap_inline2432 of the methods instanceMethod() and classMethod() for varying values of N. The execution time of the parallel versions run with the true parallel execution of threads disabled is also shown. These experiments indicate that on a uni-processor, the overhead introduced by the loop parallelization method presented in this paper is very small. In figure 9, we show the speedup of the true parallel versions. Here we see that on the four processor IBM RS/6000 G30, parallelization of this particular single loop becomes useful if N exceeds 20,000. An efficiency ranging from tex2html_wrap_inline2436 up to over tex2html_wrap_inline2438 is obtained once N exceeds 220,000.

   figure1394
Figure 9: Speedup of Single Loops


next up previous
Next: Double Loops Up: Some Loop Examples Previous: Some Loop Examples

ajcbik@extreme.indiana.edu