GFac Architecture Guide

Introduction


In e-science most of the tools are available as commandline applications in various forms. (E.g. shell scripts, Perl, Python orFORTRAN). However with availability of Internet and technologies like Grid computing, collaboration among different scientist are becoming common place, and on that settingcreating a Web service that exposes commandline application is a common requirement. This may be motivated by various reasons.

  1. Owner may need to compose applications to create workflows, matching and mixing them to use in experiments while applications run on different hosts
  2. Deployment process of commnadline applications may be too complex, and it may be easier to provide a Web Service interface for other users.
  3. Owner may not will to share the application itself with other parties, but he may share its service and he has greater control over its use.

Generic Service Toolkit facilitates users to create Web Services from a commandline application. The resulting Service (a.k.a. Application Service)will have WSDL interface that matches the behavior of the applications and provides following capabilities.

  1. Application service has soft lifetime management, and it will be turned off if it is not used for long time
  2. Application service may stage (transfer) remote input files to the host where application is running and transfer the result to remote location
  3. Application service generates events enabling users to monitor the status of execution remotely.
  4. Support Job submission to Local machine, Globus Gram or WS-Gram and application may be executed on a remote computation Grid
  5. Manages users credentials though the life cycle of Application service

Gfac includes two types of services, persistent and temporary services. Former are the services that suppose to be available 24 * 7, and the latter are services that are created on demand, and if they are not used frequently they will shutdown themselves. Gfac installation includes two persistent services, the factory service and registry service, where former create temporary application services on demand, and latter keep the deployment (state) configuration.

Gfac is part of LEAD environment discovery project which provide a workflow system based on Meteorology application services. In Next sections we shall explore Gfac architecture in detail. However if you are a user who needs to use Gfac, please look at Gfac User Guide.

Gfac Overall Architecture

Following diagram shows how Gfac fits in to other services of the LEAD workflow system.

System is fronted by a portal which allows users to create workflows using services that wrap command line applications, execute them and monitor them.

Gfac consists of two services, Registry Service that acts as the repository for Gfac Deployment descriptors and Factory Service that accepts the XML deployment descriptors that explains the application and creates an Application Service. Figure also shows few more services that would help user to understand different functionality of Gfac and we will introduce them while going through a workflow invocation.

In order to create a new workflow, user would use Gfac portlet to define XML deployment files that explain Hosts, Applications and mapping Services. Then using these Services, workflows can be composed using XBaya workflow composer. Those workflows are saved in users account.

When user invokes a workflow using the portal, following steps are taken place. The numbers given below matches the numbers given in the figure.

  1. User selects a workflow from the portal and invokes the workflow.
  2. Portal submits the workflow to the workflow to the GPEL engine.
  3. GPEL engine search for service instances that are needed to run the workflow, and if services are not available it uses factory service to create new service instances.
  4. GPEL will invoke service to run the workflow
  5. When a Application service receive a request, it parse the request and copy the remote data items to application host
  6. When input data items are ready, Application service submit the Job to application host. Using the results of the application (a.k.a Job) Gfac create a response SOAP message and send it back to GPEL engine, which would invoke the next service.

While execution, each application service publishes events about different state changes of the workflow service execution. Those events are published to notification bus and XBaya workflow composer provides users a visualization of workflow invocation.

Each new Application service may be placed on a different host from Factory service and each application service may submit jobs to a remote application Host, which is usually a computing cluster. In the discussion that follows we will use following terms.

  1. Application Service - A Web service that wraps a command line application and provide a WSDL interface to the applications
  2. Service Host - host where the application service is running
  3. Application Host - Host where the command line application is installed, this could be different from service Host and jobs are submitted using Globus Gram

To create an application service, user needs to explain the application, the service host, and the service to application mapping using XML descriptors. However we provide a portlet Web interface that generates those XML document via a Web form. The same Web interface can register those documents with Gfac registry; therefore the process will be transparent to the end user to a great extent.

Application Service Architecture

Application Service is a stateless Web Service that accepts requests, set up the input parameters for underline Application, invoke the application, build and send a response using outputs of the Application execution. In this section we will visit this process in detail.

Information Model

Application Service has a well defined information model and every other component of the system is stateless. Information model consists of four parts.

GfacContext

The Gfac top level configurations, this contains information like users grid credentials, broker URL .. Only one instance exists for one Gfac instance

ServiceContext

Information about a service. This contains service Map document, Application, Host descriptions WSDL of the service ..

ExecContext

Represent a single request to a created service. New object is created for each request

MessageContext

Represents SOAP message

Lightweight Kernel

Architecture builds on top of minimal kernel, which provides a processing pipeline. Application service is build by adding set of extensions via two set of extension points. From our experience it was evident different users has varying requirements and expectations from a application service and by introducing a lightweight core, so we expect to preserve the architecture from the details of the implementation.

The extensibility is achieved as the both levels by adding extension point to the architecture. A extension is a java class that is allowed the Intercept the processing of the Message.

Following figure shows the components of Gfac Kernel.


Upon reception of request for a new service, Factory Service will create a minimal Service. At srtartup, the Service is run though set of extensions called "static extensions". The static extensions are so named as they are executed only once per each service. These static extensions do the additional configurations to the service. Good example for one extension is adding the security to the created service. In the LEAD use case, to enable security, the Security and Capability Handlers are added to the XSUL and capability tokens are registered with the Capman service. This way the security can be added to the Gfac transparently to the core architecture. Similarly we believe other WS-Extensions and scenarios can be added via smiler extensions.

When a started service received a request, the request is accepted and processed by the Gfac core. The core read the message and populates dynamic portions of the information hierarchy (e.g. Populate the input output parameters). Then the information hierarchy is run though preProcess() method of set of dynamic extensions. The extensions are called dynamic in the sense that they are called per each request. Usually processing done by these extensions are use case specific. Main advantage we expect from the extensions are to map the commonly use specific functionality in to set of components. In one hand this makes architecture transparent from the specific details and on the other hand users/administrators may set up Gfac with subset of desired functionality by picking up components that are interest to them.

After running though the extensions the processing is handed over to Provider, which is a abstract representation of the execution of the application. Examples of the concrete representations of these providers are LocalProvider, RemoteProvider, PBSProvider, SLURMProvider ect. Again this would make architecture transparent to the real execution of services and clearly mark the extension points so providing new application execution methods are simplified.

As a recap each extension point and their intended use is listed here -

  1. Static Extensions -These are executed only once per service and invoked before service is started (or may be once service is shut down). Main advantages are extensibility and grouping related code in to a single place (For an example all the security code will be find in a one place, make it easy for newcomer to understand the architecture).
  2. Dynamic Extensions -These are executed per each request to the service. They are used to implement use case specific features. Good examples would be updating parameters for Names list files, or parsing the standard output for output parameters.

Following is a list of Known dynamic extensions implemented in Gfac

  1. FileStaging - Stage (copy) input files to the application host and output files to a output data directory
  2. MyLEAD Extension - Generate metadata and register files generated by the application with MyLEAD metadata Catalog
  3. NameListFiles Extension - Update the Name list file (Specific file used as inputs to meteorology applications) to reflect new locations of the inout data
  4. OutParamsFromDataOutDir Extension - Search the output data directory for results of the application and add them to MessageContext
  5. Std2OutPutParam Extension - Search the standard output of the application to find the results of the application execution
  6. Vgrads Extension - Prase the SOAP header and enable Vgards Provider if needed
  7. XregistryBasedAuthorizer - Reject the user request based on authorization mechanisms implemented by Xregistry


Following is a list of Known dynamic extensions implemented in Gfac

  1. LEADExtension - Deploy the dynamic extensions required by LEAD project, doing so provide a logical grouping to dynamic extensions
  2. SecurityExtension - Enable security by deploying the handlers to XSUL

Deployment Descriptors

A Application Service is explained using three deployment descriptors. They represent the Hosts available to the system, applications installed in each host and how application should be mapped to a Web Service. Schema files for Deployment Discyptors can be found here.

  1. Host Description Document - Explains a Host available to use by the workflow system. Host could be used to run services or applications and the document includes host specific information such as environment variables, temp directory, java installation, Gfac installation...
  2. Application Description Document - Explains an installed application on a host and is always bound to a host. This description includes information like executable location, environment variables....
  3. Service Description Document - Explains how an application should be mapped in to Web Service operations and how input and output parameters should be resolved.

Service Description Document refers to an application document which in turns refers to a Host description document. Before creating an application service, all three documents must be present in the registry service. User requests the Factory Service to create a Application Service by providing a Service Name. Factory Service search Registry for a host to create the service and transfer Service name together with other configurations to service Host. Once the configuration is set up, new Service is created using that configuration (In Service Host). When the Service receives a request, it searches the registry for a Host where the given application is deployed and invokes the application on that host.

To learn more about Deployment Descriptors, please refer to Deployment Descriptor Guide. We also provide a portal interface to create those Deployment Descriptors, please refer to the User Guide for more information.

Application Service In action

Following figure shows steps Application Service undergo while processing a request. Unlike last section here we discuss the process with all the extensions in place.

Service requests may use WS-Addressing to receive result of the invocation asynchronously.

  1. When a Application Service Receive a request, it is accepted by SOAP Server (XSUL), and underline code will parse the message and build a information model
  2. As we discuss before, this information model is run though set of extensions
  3. First extension is File Staging extension which will copy remote files to application host and change remote file locations to local file locations.
  4. If it is a lead specific application, Name list extension is called, which merge the user defined name list file with original name list file of the application and updates the Name list file of the applications to reflect local input file locations.
  5. Then the provider is invoked to perform Job Submission by running the application in application Host. Command to execute is created using application name from Application description documents and updated input parameters. If Provider is of type GRAM or WS-GRAM, GFac send notification specifying the current state of the Job.
  6. After provider is done, Stdout2Output and OutputDir2Output extensions are invoked. They try to infer the results of the application invocation from Standard output of the application and output data directory respectively.
  7. Following file staging extension will copy results files of the application to output file staging location.
  8. Finally MyLead Extension will register the output file and standard out and standard error files with MyLEAD metadata catalog.
  9. After all extensions are done, SOAP server sends back the response to the client.

Service Components

Providers (Job Submission)

Gfac supports following providers that uses different mechanisms for Job Submission.

  1. Local - If the Service and application hosts are the same, job will be submitted using Java Runtime support to run executables.
  2. GRAM - Job submission via Globus Gram Client
  3. WS-Gram - Job Submission via WS-GRAM Client
  4. SSH - Job submission using ssh credentials
  5. Resource Broker - Wrapper on top of WS-GRAM to perform job submission utilizing the VGRAD projects scheduling capabilities.

File Transfers

To perform file transfers, Gfac supports following mechanisms

  1. Grid FTP - File transfer with Globus Grid FTP
  2. SSH - use sftp client with SSH credentials
  3. DAMN Service - This is a dedicated service for file transfer and it support Data ID based transfers in addition to default file transfers

Workflow tracking events

While processing a request, Application service could produce events marking the different states reached in the invocation. That notification may be published using WS-Eventing or WS-Notifications specifications. Formats of the events are defined by workflow tracking project. The events are sent to event sink defined by LEAD context header.

Following are events generated by an Application Service

  1. Service Initialized
  2. Service Terminated
  3. Service Invoked
  4. Data Produced
  5. File transfer duration
  6. Computation status
  7. Computation duration
  8. Sending Response
  9. Sending Fault
  10. Sending Response Succeeded / Failed

Security

Unless all Job submissions and file transfers are local, Application Service and factory require credentials to perform those operations on behalf of the user. Credentials may be provided in one of the two forms.

  1. Globus credentials - they are loaded from the file system

  2. MyProxy credentials - Loaded from MyProxy server and if these are provided application service will automatically renew the credentials if there are about to be expired.