Information Subsystem

The Information Subsystem isn't a single software component, but rather a method by which different components store and exchange information. The main trade-off is between scalability and easy-of-use. Since a PSE may consist of an arbitrarily large number of distributed processes, it is not practical for all the information to be sent to a central repository during execution. Additionally, a user needs to be able to quickly and easily locate any desired information generated by the PSE.

In this model the basic unit of information is an event. An event is any (thread of) execution that is initiated by communications from another component. The important aspect of this definition is the requirement for communications.The PSE infrastructure does not require the participating processes to maintain a globally synchronized clock, so the communications creates a partial ordering of events between all the processes.

PSE processes are layered applications. At one end is the network interface layer and at the other is the computational layer ( often the computational layer is referred to as 'the component' since it provides the main functionality). The computation layer generates information and writes it out to the local file system. It then produces an event results object which contain text to summarize the data written out and a link to the actual file. As this object moves down the layers of the application to the network interface layer, more information is added ( like host/path name required to locate the links). The ability for lower layers to add information to the event results provides a convenient, transparent mechanism to collect and record performance metrics. In addition to any metrics provided my the computational layer, the PSE infrastructure automatically provides a standard set of metrics for all components in the system. Sometimes the PSE infrastructure must generate event results. For example, if locks for required share resources can not be acquired the computational layer can not be enter. In this case the infrastructure must generate and exception event results object.

Component Framework Statistics

One of the goals of the PSE LSA is to be able to automatically gather performance statistics of the modules and framework. This information is presented to the user through a separate Statistics Window. The purpose of this Statistics Window is to help the user understand where the time is being spent (e.g., in the network versus in computation) during the execution of a particular module. This information may prove useful in determining high-level bottlenecks and may be able to guide the user in selecting computational resources which best fits his or her needs. It should also be noted that eventually this information will be available in a more concise format for automated intelligent computational steering.

Error/Exception Handling

Handling exceptions in a distributed system presents many problems. Traditional methods such as returning (integer) error codes are not feasible because each process would have to have a way of determining what another processes' error codes mean. This would require some sort of central database which would restrict the extensibility of the system. Another approach is to throw exceptions and require some process to catch the exceptions. This approach conflicts with the layered architecture of the distributed processes. The exception would have to be caught at each layer so that data that the higher layers do not have access to can be added. But, we already have a mechanism to transport information is this fashion built into the information subsystem. The only restriction is that the exception must satisfy the definition of an event (see above) -within the LSA it does.

When a process is initialized, a collection of system error objects are created. This is done early to preallocate memory before any computational work is done. These objects are small, but the most common error to occur is an out of memory error and we want allocate enough memory before hand to keep the process running. These objects are similar to any other event result object except that they may not have originated from the computational layer, and thus may not contain performance metric data. An important advantage to this method is that since the event result object is part of the remote method signature we know a priori that the caller will receive the error message and can process it. Had an exception been thrown, we would have no guarantee that the caller would catch it.

Component Specific

Not written yet.


[ IU CS ] [ Extreme! Computing ] [ PSEware ] [ LSA ]
bramley@cs.indiana.edu

Last updated: Tue Jan 26 12:52:34 1999