The following double loop is based on an applet example found in [24]:
Data dependence analysis reveals that neither the y- nor the x-loop carries any data dependence. Parallelization of the outermost loop is more desirable to amortize startup overhead over more iterations. In contrast with the previous examples, however, in this case the loop-body of the y-loop refers to the parameter N of computePixels() (used as upper bound in the x-loop) as well as the local array pixels that is declared outside this loop. Because the value of N and pixels itself is not altered in the loop, however, the loop can still be parallelized by supplying the value of these variables to the method that is constructed in the first step (see section 2.3.1):
Loop-workers for this loop are described by the following class, having two additional fields to store the value of the integer N and the integer array pixels (see section 2.3.2):
Finally, the original loop is replaced by the construct shown below, in which the value of N and pixels is supplied to every loop-worker (see section 2.3.3):
In figure 10, we show the execution time
of the serial and parallel version of the previous
loop (excluding the time required
for explicit memory allocation) for varying values of N.
For this particular double loop, parallelization
becomes useful if N exceeds 180, and an efficiency ranging from
75 up to more than
is obtained once N exceeds 500.