A Web Interface to Parallel Program Source Code Archetypes

Authors:
Juan Villacis
Department of Computer Science
Indiana University
Bloomington, Indiana 47401
jvillaci@cs.indiana.edu
812 855-9145
Dennis Gannon
Department of Computer Science
Indiana University
Bloomington, Indiana 47401
gannon@cs.indiana.edu
812 855-5814
Keywords:
WWW, HTML, parallel programming tools, source code, electronic textbooks.

Abstract

This paper describes a set of tools for annotating and exploring program source code on the World Wide Web. These tools are part of a project to build an electronic textbook for parallel programming that exploits the Caltech Archetypes model of program construction. The tools provide a simple way for Fortran, HPF and C++ programmers to add special annotations to source codes that allow the source code to be converted automatically into WWW hypertext documents. In addition, special compiler based tools examine the source code and discover all subroutine call sites and automatically build the hypertext links to the appropriate subroutine definitions. In the case of parallel C++, a complete class browser has been constructed in HTML and the compiler based tools generates the data base of application specific type and program information.


Introduction

One of the most serious problems in improving the state of software for scalable parallel computers is the fact that parallel programing is a craft that requires time and effort to learn. There are very few up-to-date textbooks because the field is in such rapid evolution. In addition, the best way to learn any craft is by an apprenticeship with an expert. In addressing this problem, Chandy and Massingill [1], have invented the archetypes model of parallel program design and analysis, and they have constructed a prototype Archetypes electronic textbook on parallel algorithms. In the area of scientfic/numerical algorithms, Dongarra, et. al. [9] have published the ``Templates'' book which formulates a very large number of important algorithms in terms of simple, high level descriptions that are easily specialized to fit particular requirements.

Our own interpretation of this Archetypes/Templates model of learning parallel programming can be summarized as follows.

Our vision of a parallel programming textbook would be an electronic document that would allow a student to explore the family of archetypes from the perspective of the archetypes or application domains. The textbook should allow the student to study the theory behind the archetypes, study the associated templates and examine and experiment with real source codes. In experimenting with these ideas we considered the problems associated with incorporating program source code into the electronic textbook. The problems that confront us can be stated as follows.

In this paper we describe a set of tools that allow programmers to annotate their program text so that it can be automatically decomposed into hypertext form. This Web Interface to Parallel Program Source Code Archetypes (WIPPSCA) allows the reader to explore the source code from a variety of perspectives.


Overview of WIPPSCA

In designing the WIPPSCA system, it was necessary to consider the differing perspectives of the author of the code and the viewer of that code. From the author's perspective, it is important that he be able to present his code to the viewer as he sees fit, including external web resources as needed. On the other hand, the viewer may want to see the author's code in an organized manner but independent of the author's style. These two goals have been met by designing a system that emphasizes flexibility at both the author and viewer levels.

The first goal is achieved through the Anno program. Anno is a system for annotating programs written in C, C++, Fortran, or HTML. The author simply embeds a small set of directives in his code that specify how things should appear in final output form to the browser. One of the more interesting features of Anno is the ability to incoporate related works on the web into his document via the include directive. This allows for the inclusion of multimedia objects (images/sounds/movies) alongside descriptions of code, that may enhance the viewer's understanding of the author's overall code.

The second goal is achieved through the Mccp program. Mccp is a forms-based browser that acts as a filter and structural organizer of the author's code. The code is decomposed into groups (currently, annotated code, system libraries, include files, source files, and where appropriate, classes, and collections). The viewer can ``filter'' what she wants to see by submitting the Mccp form with certain groups ``activated'', and others ``deactivated''. Moreover, the level of detail within each group is also left up to the viewer. In this way, the viewer can focus in on what interests her about the author's code.


Previous Work

The idea of annotating one's code has been around for quite sometime. A serious attempt to promote ``literate programming'' was made by Donald Knuth in his famous WEB work of the early 1980s [6]. In that work, Knuth attempted to combine TeX-like formatting within structured programs in the hope that the resulting programs would be more ``readable'' to programmers of varying levels of expertise. Recently, an extension to Knuth's work has been done by Kazuo Amano [5]. Amano's src2tex program transforms source code written in a variety of high-level languages (BASIC,Fortran, C, Pascal, etc.) into TeX. This TeX document can then be transformed into HTML via Nikos Drakos' popular latex2html program [8]. The problem we have encountered with TeX-based systems such as these is that they are typically geared towards paper-based documents (i.e., linear modes of presentation), rather than on truly non-linear hypertext modes. Moreover, they only give the programmer a fixed (albeit sometimes useful) format for presenting his code. Our approach is to take advantage of the programmer's familiarity with freely available formatting tools (such as TeX) and provide a means of using these tools in conjunction with his code.

At a more detailed level, programmers would also like to be able to browse through the static call graph of the program source code. Code browsers have been devised for this purpose, particularly in the commercial software arena. These browsers typically get their information from lexical analyzers, which are used at intermediate stages within the compiling process. The browsing mechanisms, however, have generally required the purchase of proprietary software, and do not lend themselves to analysis (for example, to build call graph structures). Freely available ``browsers'', such as those bundled with XEmacs [12], analyze the syntatic structure of the programming language, and provide a limited mode for searching through and highlighting source code. Unfortunately, they are not true code browsers, as this would require a complete compiler and parser for each language it would support. Our goal is to provide programmers the capability to browse code on the web using the freely available compiler tools of the Sage++ system.

On a related note, there is work presently being done at Sun Microsystems in the area of interactive browsers [10]. These browsers make extensive use of object oriented technology that promise to give otherwise static documents a dynamic behaviour. We intend to explore this technology as an authoring tool for interactive presentations of code in the future.


Anno: Building Annotated Hypertext Program Documents

In this section we examine the Anno utility for building web documents from source code. A few of the directives will be described briefly below. For a full explanation of the syntax, the reader is referred to the Anno Users Guide [7].

Perhaps the most useful directive is the include directive. It takes on the form,

<! I [position] > body

where body is to be inlined immediately at position (or at the current position if unspecified). Body can be any combination of HTML code or URL(s). If a URL pointing to another web page is specified, Anno will fetch the HTML document, cache all top-level images, and correct the image links in the HTML document. In this way, other web pages can be easily incorporated into the author's own document.

Another interesting directive is the expand directive. It has form,

<! E label "hname"> body

where label is used to internally reference body, and hname is the highlighted text that the viewer clicks on to cause body to appear (be ``expanded'') or disappear within the current context. The intent of this directive two-fold. First, it hides regions of code that may not be important for the viewer to see immediately, but which may be of interest later. Second, it provides the author with a way of abstracting out the details of his code, thereby helping convey the structure of his program to the viewer.

The last directive we will look at is the xtend directive. It has several forms, but it can generally be taken as,

<! X func > body

where func points to a function which will be applied to body. This is essentially a hook into Anno that allows the author to put alien formatting code into his document, and then have func convert it into an inlinable object. One such utility function that is supplied with Anno is latex2xbm. This program replaces pieces of LaTeX code with a corresponding XBM image. Hence, mathematical formulas can be seamlessly added alongside descriptions of code.

Example: Annotated Fortran Code

Below is a Fortran program that illustrates the use of the aforementioned directives. Note that Anno can take any type of file that has valid annotations in it, but only those source files that pass through Sage++ will yield hypertexted code.

C <! I TOP> http://extreme.indiana.edu/edoc/header.html
C
C <! X latex "/usr/local/bin/latex2xbm">
C
C This code solves the Poisson's Equation, <! X latex BEGIN> 
C \[
C \nabla^{2}\Phi_{nlm}(\vec{r}) = 4\pi\rho_{nlm}(\vec{r}).
C \]
C <! X latex END>, for the self-consistent field.
C 
C <! E main_code "Main Code" BEGIN>

        INCLUDE 'scf.h'

        INTEGER n

C   Initialize state of the system.

        CALL initsys

C   Advance system state for a given number of steps.

        DO 100 n=1,nsteps

           CALL stepsys(n)

 100    CONTINUE

C   Terminate the simulation.

        CALL endrun

        STOP
        END
C
C <! E main_code END>
C
C < I BOTTOM> http://extreme.indiana.edu/edoc/bottom.html

In the next figure, we show the result of passing this source through Anno. The program is converted into a hypertext document where the top page consists of all the items not hidden by an expand directive. The first line inserts a small graphical icon and a header titled ``Self-Consistent Field'' from an external file. Next, the commented text is turned into document text and an xtend directive includes a small LaTeX equation. The main program body is then hidden by an expand block and placed in another file. Finally, a terminating icon and email address is included at the bottom of the document.

Self-Consistent Field


This code solves Poisson's Equation, , for the self-consistent field.

  • Main Code


  • [HPC++: Extreme! Computing]
    sage@extreme.indiana.edu

    If the viewer wishes to expand the ``Main Code'', there is a link to click on. Rather than jumping to a new page, this segment of the text is expanded directly into the main document page. If the viewer wishes to hide this block of code, a second click on the ``Main Code'' will cause it to vanish, and the document will return to its previous state. Notice that this feature can be nested. In this way, the programmer can design his source code so that the structure of the program can be revealed in stages. Furthermore, the archetype and program template can be made visible without forcing the reader to view all the technical details. Finally, each reference to a subroutine call has been hypertexted. This allows the viewer to explore the call graph of the program at will. The result of clicking on the ``Main Code'' is shown in the figure below.

    Self-Consistent Field


    This code solves Poisson's Equation, , for the self-consistent field.

  • Main Code (click to compress) INCLUDE 'scf.h' INTEGER n C Initialize state of the system. CALL initsys C Advance system state for a given number of steps. DO 100 n=1,nsteps CALL stepsys(n) 100 CONTINUE C Terminate the simulation. CALL endrun STOP END C


  • [HPC++: Extreme! Computing]
    sage@extreme.indiana.edu

    Mccp: Providing Code Browsing Capabilities

    The Anno tool allows the programmer to insert directives into a source program file so that sophisticated graphics, equations and documentation can be added into the resulting WWW document. More important, it also allows the author to control the way that the document is decomposed into nested hypertext pages.

    However, computer programs offer their own natural decomposition into nested documents. Every Fortran, C and C++ program consists of a sequence of data definitions, type definitions and function definitions. Each of these items consists of an entry in a list and within each entry are references to the other items. These references may take a variety of forms: a use of a user-defined type, a call to a function in the user's library, or a reference to a user's variable. Taken together, all of this program information is a data base of definitions and structure that a programmer must understand. Consequently, a hypertext document that represents the source code of a computer program must, in reality, be an interface to a data base engine to explore the code.

    But this idea is not new. Commercial program development tools have traditionally supported special features and tools that would allow the programmer to explore a source code and library as a data base. That is, the user can ask for a list of all functions, where they are called and used, what are the definitions of special variables, and what do the data types mean. In the case of C++ programming tools, it is traditional to support a ``class browser'' that allows the programmer to know how each class type is defined, what are the supported member functions and data attributes, what are the classes that it inherits from, and what are the associated nested types. The inheritance data is often represented as a tree and then presented as a list of attributes and functions that may be explored interactively.

    The key idea behind these tools is to make it easier to explore a program by using the static information in its type stucture and subroutine calling tree. Because these tools are part of commercial compiler systems, the static structure is known to the tools. In our case, we have been working on program analysis tools for Fortran and its extensions (Fortran-M, Fortran-S, and HPF), as well as building program preprocessors for C based tools to support extensions for parallel and distributed computing (pC++, CC++ and CORBA IDL [13]).

    In the case of pC++, the University of Oregon TAU project [3] has used our compiler technology to build a variety of special programming environment tools that exploit the compiler's knowledge of the program structure. One of the TAU tools builds a complete static call graph and class hierarchy data base by extracting data from an intermediate pass of the compiler. This tool is the key to what follows.

    Our goal is to provide a mechanism that allows a programmer to ``publish'' a program source as an interactive document without also requiring the viewer to own an expensive and complicated commercial program development environment. Mccp is very simple. It accepts programs written in a variety of Fortran dialects (Fortran 77, Fortran-S and Fortran-M, and soon Fortran90) and C++ flavors (pC++ and CC++), and generates a hypertext version of the source code. Each program subroutine and class definition is isolated into separate files. A complete subroutine call graph is used to generate hypertext links between each invocation site and the appropriate routine defintions. In the case of C++, a class browser is built that is tailored to class definitions in the source code.

    The result is a forms-based HTML document tree that the viewer can explore in addition to (or irrespective of) the author's annotated code. In the next section, we give an example of how the viewer might use the code browser to search through a class structure in pC++.

    Example: A pC++ Class Browser

    The viewer is first presented with a form as shown in figure 4. The heading of the form indicates the kind of program the author is presenting (in this case, a parallel sort program). The top section of the form lists the available groups that can be chosen from, followed by viewing parameters. Each selected group occupies a line-delimited region on the form. In this example, the ``Source Files'' group has been selected. Items within that group are displayed in a menu, from which the viewer can choose.

    In the next example, the ``Classes'' group has been selected in addition to the ``Source Files'' group. The viewer has decided to examine the code of the ``Pvector class'', which gets inserted into the form after clicking on the class view button. If the viewer had chosen multiple items from the menu, they too would be shown in the order they are listed. Finally, note that the inlined code is hypertexted, thus allowing the viewer to examine each method defnition on a separate page. Figure 5 shows the result of the viewer's actions.

    In the last example, suppose the viewer has now decided that she only wants to see methods of the ``Pvector class''. The viewer deselects the ``Source Files'' group, presses the update button at the top of the form, and is subsequently presented with a form containing only the ``Classes'' group. The viewer then presses the show methods button, and the submenu ``Methods of Pvector'' appears below the ``Classes'' menu. The viewer is now able to investigate a particular method of Pvector, such as the Qsort method, in detail. Figure 6 shows the net result of the viewer's actions.

    In each of these examples, we see that by separating components of code into groups, the viewer is able to focus in on what interests her the most about the author's code. Furthermore, the viewer can investigate each group in detail, independent of the author's own exposition. In this way, the viewer is given another avenue in which to explore the author's code.


    Implementation and System Design Issues

    With the exception of the Sage++ and TAU tools, all programs in the WIPPSCA system are written in Perl[11]. This virtually guarantees that the system will be portable to a wide range of target unix machines. Any dependencies on external (non-Perl) programs have been relegated to user supplied macros (see the xtend directive).

    There are two parts to WIPPSCA: the compiler and runtime system. During the compiling phase, the author annotates his code, and then runs it through a series of conversion programs. Two hypertexted HTML documents are subsequently produced: the author's annotated source code, and a forms-based code browser. Each of these documents has an associated data base that is used by the runtime system to generate new documents. The system enters the runtime phase once those documents are accessed from a web browser. The components of WIPPSCA are illustrated in the figure below.

    Compiling Phase

    At the start of the compiling phase, the author annotates his code according to the syntax definitions in [7]. Next, he runs his source code through Sage++. A compiler tool detects the language the source code is written in, and then generates a binary dependency or dep file. This dep file contains the complete parse tree of the source code, along with line positions of tokens. This information is then converted into a more accessible call graph format via the TAU tool dep2cgm. This tool uses heuristics to further analyze the source code and provides column positions of tokens. The call graph is then used by the cgm2html program to find and extract subroutine and class definitions from the original source code, and cross-reference call sites to their extracted sources --in short, to hypertext the source code. Also, at this point there is enough information to build the code browser, and so the initial browser form is produced. Finally, the hypertexted source code is passed through the Anno program where it is parsed according to the author's annotations and results in an initial base HTML document. Both cgm2html and Anno produce data base information used in the runtime phase. These conversion programs and their purposes are summarized in the figure below.

    Runtime Phase

    After the code has been annotated and compiled, and two base documents created, they are then made available to the web through the Mccp program. If an expand directive was issued anywhere in the annotated code, then references to it are handled dynamically through the Edoc program. Both Mccp and Edoc, together with the program data bases, form the doc engine of the runtime system. Figure 9 illustrates the runtime system in action.

    The doc engine is responsible for creating new documents from the initial base documents. A new document is created each time the viewer clicks on a form button or expand link. In general, this new document will depend on the state of the previous document. Since the client/server relationship over the web is ``stateless'', it falls on either the client or the server to keep track of the document's state.

    It turns out that the server can maintain the state of a document by encoding each document with the appropriate information. In the case of Edoc, each expand link is laced with a unique document ID. This document ID is actually an index into the document's history data base, which is private and per session. An entry in the data base tells the server which portions of the document are to be expanded. The server then uses that information to build up documents from the base document on the fly. In the case of Mccp, the state of the document can be hidden entirely within the document itself. This state information is passed on to the server through the document's form button. In both cases, the server must maintain the program data bases and any other associated information needed to generate new documents.


    Future Work

    Several ideas have been suggested that may facilitate the use of WIPPSCA. One such idea would be to incorporate Archetypes into the Design Patterns [14] model of object-oriented design. This would provide programmers with a notational system that may help them abstract out and organize the ideas on which their code is based. In additon to acting as a template for their annotated code, this system could also be semi-automated within WIPPSCA, thus providing another tool at the author's disposal. Another idea would be to extend the browser to allow users to add external annotations. This is part of a larger goal of making WIPPSCA more interactive, both to the viewer as well as to the author. These ideas can better be realized with the emerging web technologies such as Sun's HotJava [10]. We are currently investigating these and other issues surrounding program development within the WWW.


    Conclusion

    In this paper we have described a set of tools to help programmers turn source code into hypertext documents. These tools work with programs written in Fortran 77, Fortran-M, Fortran-S, Ansi C, C++, pC++ or CC++ and in the near future Fortran 90, HPF and CORBA IDL. Two classes of tools were described in detail.

    The first is a preprocessor that takes a source program annotated by the programmer and produces a WWW hypertext document. The annotations allow the programmer to include arbitrary HTML references in the comments of the program. In this way, the programmer can include equations, diagrams and links to technical papers into the source of the program so that the resulting document is more complete. In addition, the tools allow the programmer to nest the program structure so that parts of the source code can be hidden in the document. The reader can then expand the document when he or she needs to study the lower level details. Our hope is that this will allow the programmer to expose the algorithmic archetypes and program templates that form the foundation of the code.

    The second tool is a more sophisticated program analysis system that completely parses the source code and builds a static call graph and a table of user defined types and functions. This data base of program information is then used to build hypertext links in the document that connect subroutine call sites with subroutine definitions. In addition, the data base is used to build HTML-based class browsers and function tables which give the reader a more powerful way to explore the source text. In the future, we expect to be able to use more information gathered by the program analysis tools. For example, variable definition and use links, data dependence information and subroutine side-effect analysis are all possible as additional components of the resulting program document.

    If this paper is being read from a WWW browser, complete interactive examples are available at http://www.extreme.indiana.edu:80/sageDoc/examples.html, and can be accessed by clicking here. We encourage the reader to explore these examples.

    Our long term goal is to build an infrastructure for writing meta-textbooks on computational science that can be used to train programmers to use scalable parallel systems. These tools only form a small part of that infrastructure. In addition, interactive algorithm animation, collaborative learning and writing tools, on-line performance visualization and a host of other new technologies will be needed. However, with the progress that is being made in these areas, we are very optimistic about the future of this area.


    Acknowledgments

    This research is supported by ARPA under Rome Labs contract AF 30602-92-C-0135 and Fort Huachuca, contract ARMY DABT63-94-C-0029.


    References

    [1] K. Mani Chandy, "Concurrent Program Archetypes", html

    [2] Dennis Gannon, et. al., "Sage++: An Object-Oriented Toolkit and Class Library for Building Fortran and C++ Restructuring Tools", postscript

    [3] Bernd Mohr, Darryl Brown, Allen Maloney, "TAU: A Portable Parallel Program Analysis Environment for pC++", postscript

    [4] Pete Beckman, Beata Winnicka, "Sage++ Users Guide", html, postscript

    [5] Dennis Gannon, Pete Beckman, Shelby Yang, "User Guide for a Portable Parallel C++ Programming System, pC++", html, postscript

    [5] Kazuo Amano, "Src2tex: A Source To TeX Converter", source

    [6] Donald Knuth, "Literate Programming", The Computer Journal, 1984, 97-111

    [7] Juan Villacis, "Anno Users Guide", html

    [8] Nikos Drakos, "Text to Hypertext conversion with LaTeX2HTML", Baskerville, December 1993, Vol.3, No. 2, pp 12-15, postscript, html

    [9] Jack Dongarra et. al., "Templates for the Solution of Linear Systems", html

    [10] Sun Microsystems, "The HotJava Browser: A White Paper", html

    [11] Larry Wall and Randal Schwartz, "Programming Perl", O'Reilley & Associates, html

    [12] Chuck Thompson et. al., "XEmacs", source

    [13] Object Management Group, "Common Object Services Specification: Volumes I,II", March 1994

    [14] Erich Gamma et. al., "Design Patterns: Abstraction and Reuse of Object-Oriented Design", Proceedings of ECOOP '93, Kaiserslautern, Germany. postscript

    [15] W3 Organization, HTML converters: Program Language filters, html


    About the Authors

    Juan Villacis received his M.S. in Physics from the University of Missouri, Rolla in 1991 and an M.S. in Computer Science from Indiana University, Bloomington in 1994. He is currently a member of the research staff in the Extreme! Computing Group at Indiana University. His research interests include hypermedia tools and compiler tool generators for distributed source code analysis.

    Dennis Gannon received his Ph.D. in Mathematics from the University of California, Davis in 1974 and his Ph.D. in Computer Science from the University of Illinios, Urbana-Champaign in 1980. From 1980 to 1985 he was a member of the Department of Computer Science at Purdue University and from 1985 he has been with the Department of Computer Science at Indiana University, Bloomington. He has also been the research director of the Center for Innovative Computer Applications at Indiana University, Bloomington. He spent five years as a visiting senior scientist at the Center for Supercomputing Research and Development.


    Click