
The DOE Common Component Architecture Project
Participating Organizations
Argonne National Laboratory
Indiana University
Lawrence Berkeley Labs
Lawrence Livermore
Los Alamos
Oak Ridge National Lab
Sandia National Laboratory
The University of Utah
Project Goals
Technical Approach
Technology Demonstrations
Future Plans
Project Goals
The idea of using component frameworks to deal with the complexity of developing
interdisciplinary HPC applications is becoming increasingly popular. Such
systems enable programmers to accelerate project development through introducing
higher level abstractions and allowing code reusability, as well as provide
clearly specified component interfaces which facilitate the task of team
interaction. These potential benefits has encouraged research groups within
a number of laboratories and universities to develop, and experiment with
prototype systems. Consequently, there is a need for an interoperablity
standard.
The need for component programming has been recognized by the business
world and resulted in the development of systems such as Microsoft COM,
the new Corba Component Model (CCM), Enterprise Java Beans and others.
However, these systems were designed primarily for sequential applications
and do not address the needs of HPC. Their most important shortcoming from
the point of view of HPC is that they don't support abstractions necessary
for high-performance, parallel programming, and don't stress enough what
is often the most important factor in scientific programming: performance.
In addition, the existing systems are often invasive, that is, they require
substantial modifications to existing applications which may not be acceptable
to the developer of high-performance components.
In view of these problems it is critical to develop a standard which
will address specifically the needs of HPC community.
To this end, the CCA project
goals are to design a set of standards for scientific component software
that meet the following objectives
-
Component characteristics. The CCA will be used primarily for high-performance
components of both coarse and fine grain, implemented according to different
paradigms such as SPMD-style as well as shared memory multi-threaded models.
Examples of issues that need to be solved in order to build interactions
of such components include the necessity to interact with multiple processes
to deliver requests, the presence of sophisticated run-time systems, message
passing libraries and threads, and efficient transfer of large data sets.
-
Heterogeneity. Whenever technically possible, the CCA should be
able to combine within one multi-component application components executing
on multiple architectures, implemented in different languages, and using
different run-time systems. Furthermore, design priorities should be geared
towards addressing software needs most common in HPC environment; for example
interoperability with languages popular in scientific programming such
as Fortran, C and C++ should be given priority.
-
Local and remote components. We define components to be local if
they live in a single application address space and remote otherwise. Interaction
between local components should cost no more than a function call; interaction
of remote components should be able to take advantage of 0-copy protocols
and exploit other advantages offered by state of the art networking. Whenever
possible we would like to stage interoperability of both local and remote
components and be able to seamlessly change interactions from local to
remote. We will address the needs both of remote components running over
a local area network and wide area network; component applications running
over the HPC grid should be able to satisfy real-time constraints an
interact with diverse supercomputing schedulers.
-
Integration. We will try to make the integration of components as
smooth as possible. In general it should not be necessary to develop a
component specially to integrate with a specific framework,
or to rewrite an existing component when moving it from one CCA framework
to another. The most that should be required is a recompilation
or relinking.
-
High-Performance. It is essential that the set of standard features
agreed on contain mechanisms for supporting high-performance interactions;
whenever possible we should be able to avoid extra copies, extra communication
or synchronization and encourage efficient implementation such as parallel
data transfers between parallel components.
-
Openess. The CCA specification should be open, and used with open
software. In HPC this flexibility is needed to keep pace with the ever-changing
demands of the scientific programming world.
Technical Approach
The CCA Forum meets quarterly to develop the CCA specification. The
procedural model use is based on that used by the successful MPI forum.
As currently defined the Common Component Architecture consists of two
type of entities: Components and Frameworks. Components are the basic
units of software that are composed together to form applications.
Instances of Components are created and managed within a Framework which
also provides the basic services that components use to operate and communicate
with other components.
The philosophy of CCA is first to precisely define the rules for constructing
components and to specify the required behavior a component must exhibit
and the interface between components and the framework. However,
very little is said about the way the framework is constructed or the way
the user interacts with the framework to connect components together.
The reason for this is that there may be many different framework that
can be used in very different situations. Some frameworks will be
designed to optimize the use of components that are distributed across
a wide-area Grid. In other cases, the frameworks will be designed
to optimize the composition of components that run on a single, massively
parallel supercomputer.
The first specification for CCA
0.6 was completed in the fourth quarter of 1999. This specification
provides a detailed description of what a component designer needs to do
in order to write a CCA compliant component. The key ideas are very
simple.
Each
CCA component is defined by two types of interfaces. One type of
interface is called a "provides port". This interface defines a capability
that a component exports to other components. A provides port is
nothing more than a set of functions that a user of that component can
invoke. The other type of interface is a "uses port".
Uses ports represent a point of call within the component to some service
that must be provides by another component. The framework provides
a mechanism for uses ports to be connected to provides port. In the
simplest terms, the CCA specification is just a software engineering protocol
for clearly defined the services provided by a component and defined its
dependencies on other components. If used correctly and precisely, this
protocol makes it very easy for us to build re-usable software modules.
Components may reside on different machines and the port-to-port connections
use network protocols to do the remote invocations.
Or
the components may reside on the same machine. In this case the uses
port and the connected provides port can be the same object and there is
no overhead in port communication. A particularly important case
is SPMD computation. In the diagram two components, A and B are each
SPMD components. That means they each have "representatives" that
reside in different processes and the representatives communicate by MPI
in the normal SPMD style. This MPI communication is internal
to each component. However, the two components are linked by a direction
connection (shown as the blue connection) which the use in A of services
provided in B.
In addition to the work on the core specification, a subgroup at Lawrence
Livermore National Laboratory began work on a Interface Definition Language
(IDL) for scientific computation. This Scientific IDL (SIDL)is
central to the CCA vision. Every component architecture provides
some sort of a IDL to define the interfaces that a component implements.
This allows the framework and the user know what each component is capable
of doing and what specific requirements
it makes of other components. Unfortunately the IDL languages
from CORBA and COM do not efficiently support the datatypes that are central
to scientific computation. SIDL is based on standard IDL concepts
but it does support the scientific data types. Consequently it was
selected to be the common specification language for the CCA document.
Technology Demonstrations
To test the ideas in the initial CCA specification, three different implementations
were built and demonstrated at the SC'99 conference in November 1999 in
Portland. Two of these were demonstrated at the DOE 2000 exhibit
and one was part of the Utah work on the SciRun project. We
focus here on the two demonstrations within the DOE 2000 booth.
The Sandia-Oak Ridge Project.
This demonstration focused on the use of the CCA model to connect parallel
(SPMD) applications and external viewers and composition tools.
It
also is an excellent demonstration of the use of frameworks like CUMULVS
to couple computational steering to components built on the Equation Solver
Interface Standard (ESI). The project uses a direct connection
for the SPMD components and a the concept of a collective port for the
connection between visualization and the SPMD components. A collective
port is one of the major goals of CCA and it is one of the most important
departures from the commercial component technologies. Collective
ports provide a way for an "M-way" parallel component to use the services
of an "N-way" component when M and N are not the same. Collective ports
also figure prominently in the PAWS architecture at Los Alamos.
The Indiana Framework
The demonstration provided by the Indiana team is called the "Common
Component Architecture Toolkit (CCAT)".
CCAT
is designed to test CCA concepts on the wide area "Grids" such as the emerging
DOE Grid, Nasa's IPG and the NCSA Alliance Grid. It currently uses
the Argonne Globus Grid toolkit as its foundation. Component
are installed on compute hosts on the Grid and are instantiated by means
of Globus. Communication between ports is based on the Globus Nexus
communication library. On top of this library a rests HPC++
and Java which provide the remote method invocation protocol used by
CCAT. (HPC++ was developed in collaboration between Los Alamos and
Indiana.)
CCAT Services
The primary goal of CCAT is to experiment with possible CCA services.
A service is a facility that each component can rely upon being supplied
by each framework.
There
are five services described here.
-
Directory Service - a tool that allows a component to browse remote
directories of various types. The information in these directories
is data about component specifications.
-
Registry Service - a tool that allows a component to browse remote
directories of various types. The information in these directories
relates to running instances of components.
-
Creation Service - a tool that allows a component to instantiate
another component. The new component may be running in the same address
space, or it may be on a remote host.
-
Connection Service - a tool that allows one component to connect
the "ports" of one component to those of another. This is the way applications
of CCA are composed. Given these three services it is possible to
define application builders that are also components. Consequently,
such a composition tool can be run in any framework that supports these
services.
-
Event Service - a tool that allows components to publish and
subscribe to events. the event model may be a point-to-point
generator/listener model or it may be a push based publisher - channel
- subscriber model that does type filtering.
Each of these services has been designed to avoid extending the
core CCA model. Consequently each service appears to each CCA component
as just another component that has been connected to five "pre-registered"
ports. In the case of the Event Service, it only uses
the connection services and is otherwise a completely portable CCA component.
The Utah Project.
Our collaborators at the University of Utah have developed the SciRun
component-based simulation and visualization system. SciRun pre-dates CCA,
but the experiences learned from the design and development of SciRun have
been instrumental in the design of CCA. In the last quarter of 1999,
the Utah team was able to use the SIDL language tools to generate and compile
port connection "stubs" for use with their system. This is an important
demonstration of both SIDL and CCA functionality.
Future Plans
There are three primary goals for CCA in the year ahead.
-
We plan to work with several DOE application teams to test CCA on a broader
scale. As part of this effort CCA will be more closely linked with
the activities of the DOE Equation Solver Interface (ESI) working group
and ESI compatible components will be demonstrated.
-
Parallel M-to-N collective ports remain a major focus. Having demonstrated
the feasibility of parallel ports in 1999, much work remains to be done
in providing the formal specification of how collective ports are defined,
what collective data structures look like, and the collective port operations
can be made fast and efficient.
-
The are of common CCA services will be important. Event services,
creation and connection services and component directories and repositories
are things that all components and frameworks need. Though we have
seen a demonstration of how these services can be made to work, the process
of defining a set of standard interfaces to these services will require
a substantial effort.
In general a major theme of CCA will be framework interoperability.
The first CCA specification defines the requirements that the architecture
places upon individual components. That is, it tells frameworks exactly
what to expect from the components. Given this, components can be moved
from one framework to another. However there is nothing that will let one
framework interoperate with another. By defining the rules and behavior
of collective ports and data structures and defining the standard services,
we move much closer to frameworks that can interoperate. Work on
applications will demonstrate why this project is worth doing.