IVOA

 International

    Virtual

    Observatory

Alliance


A Proposal for a Common Execution Architecture Version 1.2

IVOA WG Internal Draft 2005-05-12

Working Group:
not applicable
Author(s):
Paul Harrison

Abstract

This note describes a proposal for a Common Execution Architecture (CEA) within the Virtual Observatory. It discusses the general motivation behind the design as well as detailed schema and WSDL defintions of the architecture. The scope of this document covers areas of interest to the Registry and Grid Working Groups as well as the Applications Special Interest Group.

Status of this document

This is an IVOA Working Draft for review by IVOA members and other interested parties. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use IVOA Working Drafts as reference materials or to cite them as other than "work in progress." A list of current IVOA Recommendations and other technical documents can be found at http://www.ivoa.net/Documents/.

Acknowledgements

This design was refined whilst working with Noel Winstanley as he implemented the AstroGrid Workflow component. The callbacks, in particular, owe much to his contribution.

Contents


1. Introduction

The Common Execution Architecture (CEA) is an attempt to create a reasonably small set of interfaces and schema to model how to execute a typical Astronomical application within the Virtual Observatory (VO). In this context an application can be any process that consumes or produces data, so in existing terminology could include

The CEA has been primarily designed to work within a web services calling mechanism, although it is possible to have specific language bindings using the same interfaces. For example Astrogrid has a java implementation of the interfaces that can be called directly from a java executable.

1.1 Motivation

The primary requirements motivating the creation of this architecture are;

1.2 Origins

The design for this architecture has evolved from the requirements for the Workflow/Job Execution System components within AstroGrid. It was desireable for the job execution system to have a single model for an application, so that it could deal with the (already complex) problems of scheduling, looping, conditional execution etc. without needing to have specializations for all the different types of service (SIA, Database query, Cone Search, etc.) that it might be required to invoke.

Amongst the VO specifications there was no existing model for applications that was defined at the level at which this design attempts to address. In the VOResource schema an application is defined as a Service with the interface definition. The interface defintion either relies on referring to a WSDL definition of the service, or on other schema extending the service definition to provide some specific detail as in the case of a Simple Image Access service. There is no general definition of an application in the resource.

It is clear that the WSDL model of an interface has had a large influence on the design of the CEA, but it should be remembered that the CEA is intentionally layered on top of WSDL,  so that CEA controls the scope and semantics of operations. There is only one WSDL defintion for all applications, so as far as web services are concerned the interface is constant.  CEA works by transporting meta information about the application interface within this constant WSDL interface.

2. Formal Definition

The common execution architecture can be split into two parts

  1. The model of the application and its parameters. This includes both the description of the application as a resource in the VO Registry, as well as the description of the application and its parameters at the time of invocation. These descriptions are formally made in a set of XML schema that are described in more detail in the following sections.
  2. The definitions of the interfaces necessary to call the application in an asynchronous fashion. The interfaces are described by WSDL documents that reuse some of the schema that describe the application.

2.1 Application Model

UML model model of application

As this model depicts an application in CEA is really quite a simple entity consisting of 1 or more interfaces that consist of 0 or more input parameters and 0 or more output parameters.

2.1.1 XML Schema representation

The schema representations of the application model that participate in CEA can be split into two groups

  1. Those used to describe the application in the registry
  2. Those used to describe the application in the WSDL interface

These are described in more detail in the following sections.

The schema representation is shown below, and is essentially a representation of the UML model that has been coded to recognise that the same parameter can occur in several interfaces. In addition the interface is allowed to contain optional and repeated parameters.

application base schema

This diagram also shows a number of specialized elements all within the substitution group which has Parameter as the head. These are implementation details where extra information is needed to specify how to use the parameters - for example in the case of a command line parameter it is necessary to know the command line switch or position that the parameter appears at.

BaseParameterDefinition

The description of the parameters and the parameter values are probably the heart of the CEA. It is the model for the parameters that allow us to add semantic meaning, and to give the flexibility in how the parameters are transported. The implementation is still in its infancy, but it is hoped that the parameter definition will be extended to encompass any data models that the VO produces.

The basic parameter definition from the schema is shown below

basic parameter definition

As well as the subelements visible in the diagram, there are some important attributes of a BaseParameterDefinition.

name
This is the basic identifier for the parameter
type
This defines the type of the parameter. It should be noted that when the parameter value is passed in a web service call as part of the tool the formal schema type that is used is an xsd:string, this attribute specifies how the string should be interpreted by the CEA machinery. It can currently have one of the following values;
type Meaning
integer An integer number
float Real - The exact string formats that can be recognised have yet to be defined but should include those recognized natively by the most common languages (FORTRAN, Java, C?)
complex A complex number
text Any string of characters
boolean A representation of a boolean - e.g. true/false on/off
anyURI Any Uniform Resource Indicator
VOTable A VOTable according to the [VOTABLE] standard
RA A value that is to be interpreted as a Right Ascension .This should probably be deprecated in favour of the STC-S
Dec A value that is to be interpreted as a Declination This should probably be deprecated in favour of the STC-S
ADQL the full XML version of the Astronomical Data Query Language [ADQL]
ADQL-S The string representation of Astronomical Data Query Language
STC-S A value that is specified using the Space Time Coordinate string definition [STC-S]
binary A general piece of binary data with no special interpretation.
FITS Data encoded in the Flexible Image Transport System [FITS]

Note that some of the "bulkier" types - e.g. VOTable would normally be passed "by reference"

Tool

The tool represents the full collection of parameters that are passed to a particular interface of an application and the results that are returned. The parameters are separated into the input parameters, which will be passed to the application and the output parameters which will be used to pass results back from the application. In the case of output parameters, the actual value set in the tool element is only significant in the case of the "indirect" attribute (see below) being set to true, where the value will indicate a location to put the results. If "indirect" is false for output parameters, then merely the name is significant and is a "placeholder" for a application to put results in the ResultsListener callback. tool

ParameterValue

The parameterValue model is simple but powerful representation of the parameters that are passed to an application. The actual value of the parameter is passed in the value subelement. The parameterValue type has 2 attributes that control how the parameter is interpreted.

2.2 Interfaces

2.2.1 Overview of interaction

CEA UML Sequence Diagram click on diagram to enlarge

 

The above sequence diagram illustrates how the various components of the CEA system interact when an application is executed. 

Components involved in interaction
Description of interaction steps
  1. The invoking process calls the init method of the CommonExecutionConnector interface, which is implemented by the component known as the CommonExecutionController. This will set up the execution environment for the the application and will return immediately with an executionID which is the identifier by which the CommonExecutionController keeps track of this particular execution instance. The parameters to this call are
  2. the invoking process then has the opportunity to register two classes of listener
    1. Status Monitor - this is the endpoint of the service that implements the JobMonitor interface that the ExecutionController can call to inform the monitoring process of the status of the execution instance.
    2. Results Listener - this is the endpoint of a service that implements the ResultsListener port so that the ExecutionController can report the results of the application execution once they are ready
  3. Then the execute operation should be invoked and the CommonExecutionController will then start the application.
  4. The application can then optionally return status information to the CommonExecutionController which will then pass this on to the Monitor Service.
  5. When the application completes it will inform the CommonExecutionController which will then pass the indirect results on to the storage service, the direct results back to any results listeners and inform the monitor service that the application has finished.

Some point of note;

Application Phases

The interaction diagram above indicates that the application goes through a number of phases detailed below.

INITIALIZING
The first phase - this is where a job is being set up and the parameters being checked.
PENDING
An application has been accepted for execution but is waiting in a queue.
RUNNING
An application is running.
COMPLETED
An application has completed successfully.
HELD
The application is HELD awaiting execution but will not automatically be executed (cf pending) - further action needed.
SUSPENDED
The application has been suspended by the system during execution.
ERROR
Some form of error has occured.

These are reported by status messages in the JobMonitor callback

 

2.2.2 Detailed Interface Descriptions

CommonExecutionConnector

See Appendix A1 for source.

This is the main port that is used to communicate with the application. The main operations in this port are;

JobMonitor

See Appendix A2 for source.

The only operation is the JobmMonitor port is the monitorJob operation, which expects to receive a message with the job-identifier-type (as specified in the original init operation of the CommonExectutionConnector port) and a status message.

The status message can be used to indicate the phase that the application has reached.

ResultsListener

See Appendix A3 for source.

The only operation is the putResults on the ResultsListener port. This accepts a message that contains a job-identifier-type and a result-list-type, which is just a list of parameterValues.

2.3 CEA in the Registry - VOResource Extension

It is a valid question to ask whether there needed to be a specific VOResource extension to accommodate the CEA. The standard Service element expects the interface to the service to be described in WSDL, so given that CEA has constant WSDL definitions for different applications there needs to be a way of expressing the fact that a particular CeaService can run a particular set of applications. The method that was chosen was to extend Service with an element that is just an aggregation of pointers to the actual application defintions defined in CeaApplication which is an extension of the standard Resource. These relationships are illustrated in the UML below.

registry domain diagram

For a particular application there should be only one CeaApplication entry in the registry. This entry will define everything that is necessary to run the application except for the endpoint of the service. This implies that to find a particular instance of a particular application is a two stage registry query.

  1. Query the registry to find the application of interest - note the parameter data and the IVOA identifier for the application.
  2. Query a second time to find the CeaService(s) that can run the application with that IVOA identifier.

The diagram illustrates the point that one CeaService may run several CeaApplications and that a particular CeaApplication can be run by several CeaServices.

3 Deployment

Typical Scenario

UML Deployment

This deployment shows some of the features of using the CEA (as implemented in AstroGrid)

Minimum conditions for CEA compliance

Below the general conditions for an application service to be considered CEA compliant are laid out. These conditions should be considered a minimum set

There are details of the AstroGrid implementation of a set of interacting components that use the Common Execution Architecture given in Appendix C.

4 Future Directions

It is clear that the scope of the CEA is very wide and as such provides a full methodology to define and execute applications in an asynchronous web services environment. As such it can be considered complete in that it has been used within astrogrid to implement a full workflow system. However, with the ongoing work to bring about a convergence between the grid and web services [OGSIWSRF] it might be best to refactor the asychonous calling part of this specification in terms of the WS-Resource and WS-Notification standards in order to reap the benefits of general tools that will become available that might support these standards. A suggestion as to how this might be done has been presented in the Universal worker service proposal [UWS].

In the future it might be the case that CEA describes a "profile" of how to use some of these "low level" specifications within the context of the application model described here, and as such has

Extensions

There are specific extensions to the overall model that could be considered in addition to the reworking mentioned in the previous section. These include;

Appendices

The following files are associated with this specification

Filename (with link) Namespace includes (only unique part of namespace listed) Description
AGApplicationBase.xsd http://www.ivoa.net/xml/CEA/base/v0.2 parameters This schema defines most of the basic CEA objects that are imported into both the WSDL and the Registry Schema
CEATypes.xsd http://www.ivoa.net/xml/CEA/types/v0.2 parameters This defines the the message types that are passed in queryStatus operations in the CommonExecutionConnector interface and in the MonitorJob operation of the Job Monitor interface.
VOCEA.xsd http://www.ivoa.net/xml/CEAService/v0.2

base, parameters

This defines the VOResource extensions of CeaApplication and CeaService that are used in the registry
AGParameterDefinition.xsd http://www.ivoa.net/xml/CEA/parameters/v0.2   Contains the basic parameter definition and parameterValue elements used in the other schema
ExecutionRecord.xsd http://www.ivoa.net/xml/CEA/ExecutionRecord/v0.2 types Definition of messages sent in the callbacks
CommonExecutionConnnector.wsdl   base,types The main interface definition
JobMonitor.wsdl   ExecutionRecord The job monitor listener interface defintion
CEAResultsListener.wsdl   types

The results listener interface definition

 

Appendix A1: WSDL for the Common Execution Connector

Appendix A2:WSDL for the Job Monitor Service

Appendix A3:WSDL for the Results Listener Service

Appendix B: Example Registry Entries

This is an example of the registry entry for a CEAApplication and the CEAService that can run that application.

Note - to be able to validate this document the VOCEA.xsd needs to be included into the RegistryInterface schema.

Appendix C: Details of the Astrogrid Implementation

The CEA is implemented in the following astrogrid components

There are now dozens of applications that are confom to this framework published by Astrogrid, including well known legacy applications such as SExtractor and HyperZ. A list of these applications can be obtained by querying a registry for entries with @xsi:type = 'cea:CeaApplication'. Such a query on the the astrogrid registry can be perfomed with this link .

References

AGAPP
The Astrogrid maven documentation for the Applications integration component (http://www.astrogrid.org/maven/docs/HEAD/applications/index.html)
AGJES
The Astrogrid maven documentation for the Job Execution System (http://www.astrogrid.org/maven/docs/HEAD/jes/index.html)
AGWFO
The Astrogrid maven documentation for the Workflow Objects component (http://www.astrogrid.org/maven/docs/HEAD/astrogrid-workflow-objects/index.html)
OGSIWSRF
From Open Grid Services Infrastructure to WS-Resource Framework: Refactoring and Evolution describes how OGSI constructs map to WS-Resource Framework constructs. (http://www.globus.org/wsrf/specs/ogsi_to_wsrf_1.0.pdf)
UWS
Universal Worker Proposal - Guy Rixon (http://www.ivoa.net/internal/IVOA/IvoaGridAndWebServices/uws.html)
WSI
Web Services Interoperability Guidelines (http://www.ws-i.org/)
VOTable
Ochsenbein, F. et al., VOTable Format Definition V1.1, http://www.ivoa.net/Documents/latest/VOT.html
FITS
IAU Working group http://fits.gsfc.nasa.gov/iaufwg/iaufwg.html and general documentation http://fits.gsfc.nasa.gov/fits_documentation.html
ADQL
IVOA Astronomical Data Query Language http://www.ivoa.net/internal/IVOA/IvoaVOQL/ADQL-0.91.pdf
STC-S
Space-Time Coordinate (STC) Metadata Linear String Implementation http://www.ivoa.net/Documents/latest/STC-S.html
 

Changes from version 0.1

The main changes from the previous version of this document that is published on the AstroGrid site are