next up previous
Next: Overhead Reduction Up: Tree Traversals Previous: Tree Traversals

Straightforward Parallelization

The number of levels in a tree, for example, can be computed by passing the root of this tree to the following class method compLevel1():

tabular742

Obviously, the number of levels in the two sub-trees rooted at t can be computed in parallel and compLevel1() has the form (1) given in section 3.1, where target1 and target2 are implicitly defined as Tree. Hence, the programmer can use annotations to identify compLevel1() as a parallel 2-way recursive method.

In the first step, the compiler constructs the following class (see section 3.2.1):

tabular755

Subsequently, the original method compLevel1() is rewritten into the method shown below, where a dummy assignment to l has been added to preserve the definite assignment property (see section 3.2.2):

tabular763

Finally, the following method that invokes the static method compLevel1_par_a() is added to the class Tree (see section 3.2.3):

tabular771

   figure1493
Figure 17: Tree Traversal (class method)

In figure 17, we show the serial execution time tex2html_wrap_inline2430 and the parallel execution time tex2html_wrap_inline2432 for two cut-depths c=1 (4 threads) and c=3 (16 threads). All versions are applied to full binary trees with a varying number of nodes N, letting the number of levels range from 14 to 19. Since only a constant amount of work is done for each node, a reasonable speedup may be expected for p=4, since the serial execution time tex2html_wrap_inline2474 can be changed into tex2html_wrap_inline2476 . In figure 18, we present the obtained speedup. Because the trees are well-balanced, increasing the number of threads only decreases performance due to the contention between running threads.

   figure1499
Figure 18: Speedup of Tree Traversal (class method)

Alternatively, the previous computation can be done by calling the following instance method compLevel2(), expressed specifically in the form (1), on the root of the tree (since compLevel2() is final, we avoid the situation where another method is called in case left or right contains an instance of a sub-class of Tree that overrides compLevel2()):

tabular797

Because compLevel2() is an instance method, the corresponding class TreeWorker_b has an additional target field of type Tree on which the parallel method is called in the run()-method:

tabular806

The transformations applied to the parallel method compLevel2_par_b() are similar to the transformations presented in the previous section. However, now an additional parameter is passed to the constructor of TreeWorker_b to record the target on which the method must be called. Moreover, note that the recursive method invocations in the alt_code fragment also have been replaced by invocations of compLevel2_par_b():

tabular815

Finally, the following method that implicitly calls compLevel2_par_b() on this, i.e. the object on which method compLevel2() itself was called, is added to the class Tree:

tabular824

   figure1517
Figure 19: Unbalanced Tree Traversal (instance method)

   figure1523
Figure 20: Unbalanced Tree

In figure 19, we show the results of applying this method to the trees of the previous section with two extra top nodes to introduce some load imbalancing, as illustrated in figure 20. Since for cut-depth c=1, all work is done by only one thread, in this case the parallel execution time is equal to the serial execution time with some slight overhead that is mainly due to passing the additional parameter d_x. For cut-depth c=3, however, speedup is obtained again. Moreover, because threads that have completed their work no longer compete for processor time, in this case less overhead due to contention arises (cf. the parallel execution time for c=3 in figures 17 and 19).


next up previous
Next: Overhead Reduction Up: Tree Traversals Previous: Tree Traversals

ajcbik@extreme.indiana.edu