In this note, we explore a way to speedup certain operations in Java programs using ideas from the mathematical software community. Here, it has been widely accepted that adopting a set of basic routines for problems in linear algebra can help in improving the clarity, portability, modularity, maintenance, robustness, and even the efficiency of mathematical software. The most well-known example of such a set of routines is formed by the Basic Linear Algebra Subprograms [9, ch5]. The original set of vector-vector operations is now commonly referred to as Level 1 BLAS [13,14]. The set has been extended to Level 2 BLAS [7,8] and Level 3 BLAS [5,6] to provide more opportunities to exploit vector processing facilities for matrix-vector operations and memory hierarchies or parallelism for matrix-matrix operations, respectively. Once an efficient implementation of BLAS is available, new mathematical software can be easily build on top of the primitives.
Obviously, a similar approach can be taken for Java by extending the Java API (Application Programming Interface) with an appropriate set of mathematical primitives. In first instance, a Java implementation can be provided for all these primitives to preserve the portability of all Java programs in which the mathematical primitives are used.
On a particular machine, however, the performance of all Java software that uses these primitives is simply improved by providing native implementations of the mathematical primitives. Although providing a broad range of highly optimized mathematical primitives would offer the best potential to exploit all characteristic of a particular target machine, this approach would also require the most programming efforts to port the mathematical primitives in the API to different machines. Therefore, in this research note, we explore the potential of extending the API with straightforward native implementations of Level 1 BLAS only. We will see that this extension alone already can improve performance substantially, while combining these native Level 1 BLAS with multi-threading in Java may even provide a simple and portable way to outperform compiled serial C code on multi-processors.
In section 2, we briefly discuss how native methods are integrated in Java. In section 3, we present the results of a series of experiments, followed by conclusions in section 4.