There are three types of parallel algorithms in the PSTL:
The algorithms in the first group retain their STL names, but allow pariterator arguments in place of iterators. These new algorithms are collective and have parallel semantics.
The algorithms in the second group are also versions of the STL algorithms, but par_ will be prepended to their names. When invoked with parallel iterators, these algorithms are semantically equivalent to the first type of algorithm (and are collective). When invoked with local iterators, these algorithms are not collective. Execution is local to a particular context, but has parallel semantics (so a loop may be parallelized, for example).
The differentiation between the versions of these algorithms is accomplished using iterator tags. The body of each algorithm consists of a call to another function with an added argument: iterator_category(first). The local or parallel iterator version of the function is called, depending on the iterator tag. Note that this differentiation is done at compile time using function templates.
The following example illustrates these algorithm versions.
// create a distributed vector of size 100 and
// initialize each element to 5.0
distributed_vector<double> myVec(100, 5.0);
// sum the local elements in this context
double sum1 = accumulate(myVec.begin(), myVec.end(),
0.0);
// sum all elements - collective operation
double sum2 = accumulate(myVec.parbegin(),
myVec.parend(), 0.0);
// sum the local elements in this context
// using single-context parallelism
double sum3 = par_accumulate(myVec.begin(),
myVec.end(), 0.0);
// sum all elements - collective operation
double sum4 = par_accumulate(myVec.parbegin(),
myVec.parend(), 0.0);
In this example, let us assume that 3 contexts are involved. We first create a distributed vector with 100 elements of type double. Since the default distribution is used (no ContainerRatio object is specified in the constructor), 33 elements are stored in the first two contexts and 34 are stored in the last. Each element is initialized to 5.0.
The STL algorithm accumulate is declared as:
template <class Iterator, class T> T accumulate(Iterator first, Iterator last, T initial)The algorithm sums the elements in the iteration space [first, last) using initial as the initial value.
The first call to accumulate is a non-collective operation and results in the invocation of the STL version of accumulate in the calling context. While it does not need to be invoked in each context, it is in this case. Each invocation is, however, completely independent of the other contexts. In contexts 0 and 1, the value of sum1 is 165.0 (5 * 33). In context 2, the value is 170.0 because there is one extra element.
The second accumulate call is differentiated from the first by the use of parallel iterators. It is a collective operation and must be called in all contexts. The local elements are summed in each context and then a global sum is computed using a reduction operation. In all contexts, sum2 is 500.0.
Because it is invoked with local iterators, the first call to par_accumulate is a non-collective operation. Unlike accumulate, however, par_accumulate with local iterators uses single-context parallelism, if available, to compute the local sum. Lightweight threads are used in a context to accumulate partial sums of the local elements; these are summed to get the total for the local context. The value of sum3 in each context corresponds to the value of sum1; they differ only in the processing involved. Finally, the second call to par_accumulate is equivalent to the second call to accumulate since behavior of the algorithms is the same when they are called with parallel iterators.
The third group of algorithms consists of special par_ algorithms such as par_apply, par_scan, and par_reduce. These are collective operations with parallel semantics. The following have been defined thus far:
Figure 1: Example of par_apply usage
For example, the declaration of par_apply for a binary function object is:
template <class ForwardIterator1,
class ForwardIterator2,
class BinaryOperation>
void par_apply(ForwardIterator1 begin1,
ForwardIterator1 end1,
ForwardIterator2 begin2,
BinaryOperation binop);
Here, the first iteration space is defined by [begin1, end1).
If this iteration space contains k elements, then the second
iteration space consists of [begin2, begin2 + k). In applying
the binary function, each element in the first iteration space is
paired with the corresponding element in the second iteration space.
Consider the following example.
class add_to {
public:
void operator()(int& m, int n) { m += n; }
};
// create vectors
distributed_vector<int> x(15);
distributed_vector<int> y(10);
// initialize element values
do_initialization(x.parbegin(),x.parend());
do_more_initialization(y.parbegin(),y.parend());
// add the first 10 elements of y to the
// corresponding elements in x
par_apply(x.parbegin(),x.parbegin()+10,y.parbegin(),
add_to());
Figure 1 shows the two vectors, with the elements from the iteration spaces shaded in gray. The processing within par_apply in each context is as follows:
Last modified: Thu Feb 4 17:06:00 EST 1999