next up previous
Next: Loop Parallelization Up: Automatically Exploiting Implicit Parallelism Previous: Automatically Exploiting Implicit Parallelism

Introduction

To obtain true portability, a Java program is compiled into the architectural neutral instructions (bytecode) of an abstract machine (the Java Virtual Machine), rather than into native machine code. In this manner, a compiled Java program can run on any platform on which a Java bytecode interpreter is available. Although the interpretation of bytecode is substantially faster than the interpretation of most high level languages, still a performance penalty must be paid for portability. For many interactive applications, this is not a major drawback. In other situations, however, performance may be more essential. In these cases, so-called `just-in-time compilation' can be useful, where at run-time the bytecode is compiled into native machine code. With this approach, performance close to the performance of compiled languages can be obtained. However, because the demand for more computing power is likely to remain, other means to speedup Java programs have to be found.

In this paper, we show how some forms of implicit parallelism in Java programs can be made explicit by a restructuring compiler using the multi-threading mechanism of the language (see e.g. [2, 9, 11, 12, 17, 23, 24, 25, 34] for a detailed presentation of multi-threading in Java). In particular, we focus on automatically exploiting implicit parallelism in loops and multi-way recursive methods. Obviously, letting a compiler deal with the transformations that make implicit parallelism explicit simplifies the task of the programmer and makes the parallelization less error-prone. Moreover, because parallelism is expressed in Java itself, the transformed program remains portable and speedup can be obtained on any platform on which the Java bytecode interpreter supports the true parallel execution of threads (typically a shared-address-space architecture [16]), whereas we will see that the transformations presented in this paper only induce a slight overhead on uni-processors.

   figure1265
Figure 1: Restructuring, compiling, and interpreting

In figure 1, we illustrate our approach to automatically exploiting implicit parallelism in Java programs. A Java program MyClass.java is used as input of our source to source Java restructuring compiler javar. First, the compiler identifies the loops and the multi-way recursive methods that can exploit implicit parallelism. Parallel loops are either detected automatically by means of data dependence analysis, or are identified explicitly by the programmer by means of annotations. Because automatically detecting implicit parallelism in multi-way recursive methods can be very hard, in this paper we simply assume that such parallelism is always identified explicitly by means of annotations. Thereafter, the compiler transforms the input program into a form that uses the multi-threading mechanism of Java to make all implicit parallelism explicit. Because parallelism is expressed in Java itself, the transformed program can still be compiled into bytecode by any Java compiler (javac in the figure), and subsequently interpreted by any bytecode interpreter (java in the figure, or, alternatively, an interpreter that is embedded in a browser or appletviewer). Since filenames are essential in Java, the transformed program is stored in the file MyClass.java after a copy of the original program has been saved in a file MyClass.orig. In case changes to the original program are required, this latter file must be renamed into MyClass.java again before javar can be re-applied to the program.

The rest of this paper is organized as follows. First, in section 2, we show how a restructuring compiler can exploit implicit loop parallelism in stride-1 loops. Thereafter, in section 3 we discuss how implicit parallelism in multi-way recursive methods can be made explicit. In section 4, we presents the results of a series of experiments that show the potential of the transformations presented in this paper. Finally, we state conclusions in section 5.


next up previous
Next: Loop Parallelization Up: Automatically Exploiting Implicit Parallelism Previous: Automatically Exploiting Implicit Parallelism

ajcbik@extreme.indiana.edu