As an example of a typical divide-and-conquer algorithm, consider the following implementation of quick-sorting [13] in which, as advocated in [33], small sub-arrays are sorted by insertion sorting to prevent further recursive method invocations for small sub-arrays:
Figure 22: Quick/Insertion Sorting of Reversed Integer Arrays
After the programmer has indicated that the recursive
method invocations can be done in parallel, parallelization
of this method proceeds as explained in the previous sections.
In figure 22, we show the serial execution
time
and parallel execution time
with cut-depths c=1 and c=3 for integer
arrays of varying length N. In each array of length N,
value N-i+1 is assigned to every element i, so that with
the pivoting method shown above, the method invocation trees are well-balanced.
In figure 23, we show the corresponding speedup.
The reason that even for well-balanced
method invocation trees the speedup is never optimal becomes
immediately apparent from the discussion in section 3.1.
The initial linear terms in
contribute substantially to the best possible parallel execution time
for p=4. For N=200,000, for instance, the time required to partition an array
of size N and N/2 takes 0.08 sec. and 0.04 sec.,
respectively, which is about the distance
between the measured parallel execution time and the ideal parallel execution time
(i.e. simply the serial execution time divided by four).
In figure 24, we present the execution time
for random integer arrays (the same pseudo-random sequence of
a particular length was used in the serial and parallel
experiments). Here we see that some of the load imbalancing
due to imbalanced invocation trees can be resolved
by starting additional threads.
Figure 23: Speedup of Quick/Insertion Sorting of Reversed Integer Arrays
Figure 24: Quick/Insertion Sorting of Random Integer Arrays