Consider a linked-list of integers that is implemented
using the following class `List`:

Under the assumption that each list is terminated
with a special node `SENTINEL` that points
to itself and with a data item larger than all elements in the list,
merge-sorting can be implemented as follows (see e.g. [33]):

Obviously, `listMergesort()` is a parallel 2-way
recursive class method and can be parallelized using
the method of this paper. In this case, however,
the performance can depend substantially
on the implementation of the method `merge()`.
This method must merge two sorted linked lists into one completely sorted
linked list, as illustrated in figure 27.
Consider, for example, the following simple
implementation of `merge()`, in which an
auxiliary node `l` is explicitly allocated
to obtain a hook to the new list:

**Figure 27:** Merging two Linked Lists

**Figure 28:** Linked List Merge Sorting (memory allocation)

In figure 28, we show the execution time
of serial and parallel merge-sorting on integer lists
varying in length up to *N*=100,000. Rather surprisingly,
parallelizing the previous implementation of merge-sorting
dramatically decreases the performance. This performance
decrease is due to the explicit memory allocation in `merge()`.
If we implement `merge()` without the need
for an auxiliary node, an example of which is shown below,
then the execution time shown in figure 29 result:

**Figure 29:** Linked List Merge Sorting (no memory allocation)

This experiment clearly reveals that the parallelization method of this paper should only be applied to parallel multi-way recursive methods in which no explicit memory allocation is performed.

ajcbik@extreme.indiana.edu