next up previous
Next: Native BLAS Up: A Note on Native Java Previous: A Note on Native Java

Introduction

Portability of the Java programming language [12] is obtained by compiling Java programs into architectural neutral instructions (bytecode) of the Java Virtual Machine (JVM) [15], rather than into native machine code. Bytecode runs on any platform that supports an implementation of the JVM. Although the interpretation of bytecode is substantially faster than the interpretation of most high level languages, still a performance penalty must be paid for this portability. Clearly, for Java applications that are computational intensive, it would be desirable to reduce this performance penalty without sacrificing the portability of the language. One approach, for example, is to optimize the Java bytecode [2,4], either at compile-time (where a machine independent bytecode to bytecode optimization is added as additional phase to the Java compiler), or at run-time (where optimizations that require knowledge of the target machine are applied before execution). Some implementations of the JVM further improve performance by means of `just-in-time compilation' (JITC), where, at run-time, bytecode is compiled into native machine code.

In this note, we explore a way to speedup certain operations in Java programs using ideas from the mathematical software community. Here, it has been widely accepted that adopting a set of basic routines for problems in linear algebra can help in improving the clarity, portability, modularity, maintenance, robustness, and even the efficiency of mathematical software. The most well-known example of such a set of routines is formed by the Basic Linear Algebra Subprograms [9, ch5]. The original set of vector-vector operations is now commonly referred to as Level 1 BLAS [13,14]. The set has been extended to Level 2 BLAS [7,8] and Level 3 BLAS [5,6] to provide more opportunities to exploit vector processing facilities for matrix-vector operations and memory hierarchies or parallelism for matrix-matrix operations, respectively. Once an efficient implementation of BLAS is available, new mathematical software can be easily build on top of the primitives.

Obviously, a similar approach can be taken for Java by extending the Java API (Application Programming Interface) with an appropriate set of mathematical primitives. In first instance, a Java implementation can be provided for all these primitives to preserve the portability of all Java programs in which the mathematical primitives are used.

On a particular machine, however, the performance of all Java software that uses these primitives is simply improved by providing native implementations of the mathematical primitives. Although providing a broad range of highly optimized mathematical primitives would offer the best potential to exploit all characteristic of a particular target machine, this approach would also require the most programming efforts to port the mathematical primitives in the API to different machines. Therefore, in this research note, we explore the potential of extending the API with straightforward native implementations of Level 1 BLAS only. We will see that this extension alone already can improve performance substantially, while combining these native Level 1 BLAS with multi-threading in Java may even provide a simple and portable way to outperform compiled serial C code on multi-processors.

In section 2, we briefly discuss how native methods are integrated in Java. In section 3, we present the results of a series of experiments, followed by conclusions in section 4.


next up previous
Next: Native BLAS Up: A Note on Native Java Previous: A Note on Native Java
ajcbik@extreme.indiana.edu