Introduction Projects and Files Statements Expressions Symbols Types
Attributes Labels Non-Member Functions Constructing Declarations Example Programs Index

Introduction

Aims and Objectives

Sage++ is an attempt to provide an object oriented toolkit for building program transformation systems for Fortran 77, Fortran 90, C and C++ languages. Sage++ is intended to be used by researchers interested in building parallelizing compilers, performance analysis tools, and source code optimizers. It is designed as an open C++ class library that provides the user with a set of parsers, a structured parse tree, a symbol and type table and access to programmer annotations embedded in the source text. The heart of the system is a set of functions that allow the tool builder complete freedom in restructuring the parse tree and a mechanism (called unparsing) for generating new source code from the restructured internal form.

The library is organized as a class hierarchy that provides access to the parse tree, symbol table and type table for each file in an application project. There are five basic families of classes in the library: Project and Files, Statements, Expressions, Symbols, and Types.

History

Sage++ is based on an older system called Sigma (faust) which was, in turn, based on the Blaze compiler designed by Piyush Mehrotra in 1984. In its original form, Sigma was a tool kit for program restructuring that was accessed through the EMACS text editor (sigmacs). The primary advantage of EMACS is that it has powerful, built-in programming system, ELISP, that allows it to be highly customized. Tool builders were able to call Sigma operations from ELISP code and interactively restructure programs.

Because many potential users were interested in building ``stand alone'' tools (not linked to EMACS), we designed an C function interface to the Sigma system, called Sigma II, which provided a high level view of the ``data base'' consisting of the parse trees of the source code and the associated data dependence information. Because the underlying data structures generated by the parsers are very complex and awkward to use, Sigma II provided a level of abstraction that allowed the tool builder to work in terms of source program units like statements and expressions rather than the ``low level'' bit fields and linked lists of the internal structures. About the same time a group at the University of Rennes and IRISA in France, designed a Sigma ``ToolBox'' that provided access to more powerful transformations and users annotations in the source code. Alhough the ToolBox was more powerful, it also required more knowledge of the underlying parser data structures.

The design of Sage++ is based on the IRISA ToolBox, but it provides an additional level of abstraction similar to, but more flexible than the Sigma II interface. One important difference between Sage++ and Sigma is the treatment of data dependencies and control flow information.

The Sigma System, has a built-in control flow and data dependence analysis package. While this system had many advanced features, such as full symbolic analysis and rudimentary interprocedural capabilities, it was also limited in scope and hard to use. What was more important, it was embedded at the lowest level of software and written in terms of the parser data structures. Consequently it was nearly impossible to modify by users wishing to experiment with more recent advances in data dependence analysis theory.

In Sage++, it has been decided to add the control flow structures and data dependence analysis primitives on top of the user level class library. In this way, they can be easily modified or extended by the tool user. This aspect of Sage++ is not yet complete.

Overview

In this section we provide an overview of the Sage++ library. There are five basic families of classes in the library: Projects and Files which correspond to source files in a multi-source application project; Statements which correspond to the basic source statements in Fortran90, C and C++; Expressions which are contained within statements; Symbols which are the basic user defined identifiers; and Types which are associated with each identifier and expression. In addition, the SgAttribute class allows the users to add their own information to Sage++ objects. Attributes can be attached to SgStatement, SgExpression, SgSymbol, and SgType objects. To find out more about the attributes, please see section Attributes.

In Sage++, program parsing and program analysis and restructuring are divided into two phases. Application projects in Fortran77, Fortran90, C and C++ are first parsed, one file at a time to produce a machine independent binary internal format called a .dep file. For example, given a application with source files Main.f, Subs.f, c++funs.C, cfuns.c one invokes the Fortran parser cfp or the C parser pc++ to generate the corresponding .dep files. Finally the user builds a project file, MyProject.proj which lists each of the .dep files, one per line. In this example, the .proj file is

Main.dep
Subs.dep
c++funs.dep
cfuns.dep

The source language type is encoded within the .dep file. It should be noted that the .dep file is a complete translation of the source including comments, and the original source, up to the line numbers of statements, can be regenerated. Note that pc++ passes the files through a standard preprocessor before actually parsing them and the comments are discarded by the preprocessor. However, pC++2dep does not include the preprocessing step, and thus comments are retained (but no preprocessing is done).

The purpose of the project file is so that it is possible to exploit interprocedural analysis.

Limitations

Sage++ has proven to be a powerful tool for our compiler prototyping experiments, but it still has a number of important limitations. The most important of these is that it is not easy for users to add language extensions to Fortran or C to the system. In principle this is not difficult. To add a new statement to the language one must extend the parser which is based on a the GNU Bison version of YACC. A new node type must be added to the internal form and a corresponding subclass added to the Sage++ hierarchy. The unparser module, which is table driven, must be extended to recognize this new node. While we have done this several times (we have added some of the PCF extensions to Fortran and extended C++ to define our pC++ language), it is not an easy task because it requires a complete understanding of the internal parser structures.


Exit Sage++ User's Guide