The number of levels in a tree, for example, can be computed by passing the root of this tree to the following class method compLevel1():
Obviously, the number of levels in the two sub-trees rooted at t can be computed in parallel and compLevel1() has the form (1) given in section 3.1, where target1 and target2 are implicitly defined as Tree. Hence, the programmer can use annotations to identify compLevel1() as a parallel 2-way recursive method.
In the first step, the compiler constructs the following class (see section 3.2.1):
Subsequently, the original method compLevel1() is rewritten into the method shown below, where a dummy assignment to l has been added to preserve the definite assignment property (see section 3.2.2):
Finally, the following method that invokes the static method compLevel1_par_a() is added to the class Tree (see section 3.2.3):
Figure 17: Tree Traversal (class method)
In figure 17, we show the serial execution
time
and the parallel
execution time
for two cut-depths c=1
(4 threads) and c=3 (16 threads).
All versions are applied to full binary trees with a varying number
of nodes N, letting the number of levels range from 14 to 19.
Since only a constant amount of work is done for each node,
a reasonable speedup may be expected for p=4,
since the serial execution time
can be changed into
.
In figure 18, we present the obtained speedup.
Because the trees are well-balanced, increasing
the number of threads only decreases performance due to
the contention between running threads.
Figure 18: Speedup of Tree Traversal (class method)
Alternatively, the previous computation can be done by calling the following instance method compLevel2(), expressed specifically in the form (1), on the root of the tree (since compLevel2() is final, we avoid the situation where another method is called in case left or right contains an instance of a sub-class of Tree that overrides compLevel2()):
Because compLevel2() is an instance method, the corresponding class TreeWorker_b has an additional target field of type Tree on which the parallel method is called in the run()-method:
The transformations applied to the parallel method compLevel2_par_b() are similar to the transformations presented in the previous section. However, now an additional parameter is passed to the constructor of TreeWorker_b to record the target on which the method must be called. Moreover, note that the recursive method invocations in the alt_code fragment also have been replaced by invocations of compLevel2_par_b():
Finally, the following method that implicitly calls compLevel2_par_b() on this, i.e. the object on which method compLevel2() itself was called, is added to the class Tree:
Figure 19: Unbalanced Tree Traversal (instance method)
In figure 19, we show the results of applying this method to the trees of the previous section with two extra top nodes to introduce some load imbalancing, as illustrated in figure 20. Since for cut-depth c=1, all work is done by only one thread, in this case the parallel execution time is equal to the serial execution time with some slight overhead that is mainly due to passing the additional parameter d_x. For cut-depth c=3, however, speedup is obtained again. Moreover, because threads that have completed their work no longer compete for processor time, in this case less overhead due to contention arises (cf. the parallel execution time for c=3 in figures 17 and 19).