 |
International
Virtual
Observatory
Alliance
|
A Proposal for a Common Execution Architecture
Version 1.2
IVOA WG Internal Draft 2005-05-12
- Working Group:
- not applicable
- Author(s):
- Paul Harrison
This note describes a proposal for a Common Execution Architecture
(CEA) within the Virtual Observatory. It discusses the general
motivation behind the design as well as detailed schema and WSDL
defintions of the architecture. The scope of this document covers areas
of interest to the Registry and Grid Working Groups as well as the
Applications Special Interest Group.
This is an IVOA Working Draft for review by IVOA members and other
interested parties. It is a draft document and may be updated,
replaced, or obsoleted by other documents at any time. It is
inappropriate to use IVOA Working Drafts as reference materials or to
cite them as other than "work in progress." A list of
current IVOA Recommendations
and
other technical documents can be found at
http://www.ivoa.net/Documents/.
This design was refined whilst working with Noel Winstanley as he implemented the AstroGrid Workflow component. The callbacks, in particular, owe much to his contribution.
The Common Execution Architecture (CEA) is an attempt to create a
reasonably small set of interfaces and schema to model how to execute a
typical Astronomical application within the Virtual Observatory (VO).
In this context an application can be any process that consumes or
produces data, so in existing terminology could include
- A unix command line application
- A database query
- A web service
The CEA has been primarily designed to work within a web services
calling mechanism, although it is possible to have specific language
bindings using the same interfaces. For example Astrogrid has a java
implementation of the interfaces that can be called directly from a
java executable.
The primary requirements motivating the creation of this
architecture are;
- To create a uniform interface and model for an application and
its parameters. This has twin benefits;
- It allows VO infrastructure writers a single model of an
application that that have to code for.
- Application writers know what they have to implement to be
compatible with a VO Infrastructure.
- To provide a higher level description than WSDL 1.1 can offer.
- Restrict the almost limitless possibilities allowed by WSDL
into a manageable subset. There are too many ways that web service interfaces can be expressed in WSDL 1.1 - however, even when interoperability [WSI] guidelines are followed, there are still many choices for expressing parameter values - we need a common way to do this in IVOA workflows.
- Provide specific semantics for some astronomical quantities.
- Provide extra information not allowed in WSDL - e.g. default
values, descriptions for use in a GUI etc.
- Insulate upper software layers from the differences in WSDL styles, and possible grid implementations.
- To provide extensions with the VO Resource schema (See
the IVOA WG) that can describe a general application
- To provide asynchronous operation of an application - This is
essential as the call tree that invokes the application cannot be
expected to be active for extremely long lasting operations - e.g. a
user from a web browser invokes a data-mining operation that takes days
- Provide callback for notification of finishing.
- Provide polling mechanisms for status.
- Provide Job identification.
- To allow for the data flow to not necessarily have to follow the
call tree. In a typical application execution the results are returned
to the invoking process - In a VO scenario, it can be useful if the
application can be instructed to pass the results on to a different
location, especially as a staging point for the result of asynchonous calls.
The design for this architecture has evolved from the requirements
for the Workflow/Job Execution System components within AstroGrid. It
was desireable for the job execution system to have a single model for
an application, so that it could deal with the (already complex)
problems of scheduling, looping, conditional execution etc. without
needing to have specializations for all the different types of service
(SIA, Database query, Cone Search, etc.) that it might be required to
invoke.
Amongst the VO specifications there was no existing model for
applications that was defined at the level at which this design
attempts to address. In the VOResource schema an application is defined
as a Service with the interface definition. The interface defintion
either relies on referring to a WSDL definition of the service, or on
other schema extending the service definition to provide some specific
detail as in the case of a Simple Image Access service. There is no
general definition of an application in the resource.
It is clear that the WSDL model of an interface has had a large
influence on the design of the CEA, but it should be remembered that
the CEA is intentionally layered on top of WSDL, so that CEA
controls the scope and semantics of operations. There is only one WSDL
defintion for all applications, so as far as web services are concerned
the interface is constant. CEA works by transporting meta
information about the application interface within this constant WSDL
interface.
The common execution architecture can be split into two parts
- The model of the application and its parameters. This includes both the description of the application as a resource in the VO Registry, as well as the description of the application and its parameters at the time of invocation. These descriptions are formally made in a set of XML schema that are described in more detail in the following sections.
- The definitions of the interfaces necessary to call the application in an asynchronous fashion. The interfaces are described by WSDL documents that reuse some of the schema that describe the application.
2.1 Application Model
UML model

As this model depicts an application in CEA is really quite a simple entity consisting of 1 or more interfaces that consist of 0 or more input parameters and 0 or more output parameters.
2.1.1 XML Schema representation
The schema representations of the application model that participate in CEA can be split into two groups
- Those used to describe the application in the registry
- Application the overall application, which has a series of
- BaseParameterDefinition which are the detailed descriptions of the parameters and their types.
- Those used to describe the application in the WSDL interface
- Tool - An instance of an application with real parameter values.
- ParameterValue which is used to pass a values to a Tool.
These are described in more detail in the following sections.
The schema representation is shown below, and is essentially a representation of the UML model that has been coded to recognise that the same parameter can occur in several interfaces. In addition the interface is allowed to contain optional and repeated parameters.
This diagram also shows a number of specialized elements all within the substitution group which has Parameter as the head. These are implementation details where extra information is needed to specify how to use the parameters - for example in the case of a command line parameter it is necessary to know the command line switch or position that the parameter appears at.
BaseParameterDefinition
The description of the parameters and the parameter values are probably the heart of the CEA. It is the model for the parameters that allow us to add semantic meaning, and to give the flexibility in how the parameters are transported. The implementation is still in its infancy, but it is hoped that the parameter definition will be extended to encompass any data models that the VO produces.
The basic parameter definition from the schema is shown below
As well as the subelements visible in the diagram, there are some important attributes of a BaseParameterDefinition.
- name
- This is the basic identifier for the parameter
- type
- This defines the type of the parameter. It should be noted that when the parameter value is passed in a web service call as part of the tool the formal schema type that is used is an xsd:string, this attribute specifies how the string should be interpreted by the CEA machinery. It can currently have one of the following values;
| type |
Meaning |
| integer |
An integer number |
| float |
Real - The exact string formats that can be recognised have yet to be defined but should include those recognized natively by the most common languages (FORTRAN, Java, C?) |
| complex |
A complex number |
| text |
Any string of characters |
| boolean |
A representation of a boolean - e.g. true/false on/off |
| anyURI |
Any Uniform Resource Indicator
|
| VOTable |
A VOTable according to the [VOTABLE] standard |
| RA |
A value that is to be interpreted as a Right Ascension .This should probably be deprecated in favour of the STC-S |
| Dec |
A value that is to be interpreted as a Declination This should probably be deprecated in favour of the STC-S |
| ADQL |
the full XML version of the Astronomical Data Query Language [ADQL] |
| ADQL-S |
The string representation of Astronomical Data Query Language |
| STC-S |
A value that is specified using the Space Time Coordinate string definition [STC-S] |
| binary |
A general piece of binary data with no special interpretation. |
| FITS |
Data encoded in the Flexible Image Transport System [FITS] |
Note that some of the "bulkier" types - e.g. VOTable would normally be passed "by reference"
Tool
The tool represents the full collection of parameters that are passed to a particular interface of an application and the results that are returned. The parameters are separated into the input parameters, which will be passed to the application and the output parameters which will be used to pass results back from the application. In the case of output parameters, the actual value set in the tool element is only significant in the case of the "indirect" attribute (see below) being set to true, where the value will indicate a location to put the results. If "indirect" is false for output parameters, then merely the name is significant and is a "placeholder" for a application to put results in the ResultsListener callback.

ParameterValue
The parameterValue model is simple but powerful representation of the parameters that are passed to an application. The actual value of the parameter is passed in the value subelement. The parameterValue type has 2 attributes that control how the parameter is interpreted.
- name The identifier for the parameter.
- indirect This describes whether the value element of the parameter should be used as is (indirect="false"), or if the value of the parameter represents a uri from which the actual value should be fetched (indirect="true"). The minimum set of transport mechanisms a service should understand to be CEA compliant are;
- http get/put
- ftp
- VOSpace
- local (to the Common Execution Controller) filestore
2.2 Interfaces
2.2.1 Overview of interaction
click on diagram to enlarge
The above sequence diagram illustrates how the various components of
the CEA system interact when an application is executed.
Components involved in interaction
- Application - This is the process that is to be executed. It is defined as a process that can consume or create data. So this can include unix command line tools, database queries, web services etc.
- Common Execution Controller - this is the component that implements the CommonExecutionConnector interface, and actually controls the execution of the application. There can be various specialisms of this service, such as the CommandLineApplicationController, which can be configured to invoke a general unix command line tool, a WebServiceApplicationController, which can be configured to act as a proxy to call a general web service in a uniform manner.
- Invoking process
- Monitoring Service - This is a service that the Common Execution Controller can report status to.
- Storage Service - this is the mechanism by which the application can return its results in the indirect parameter mode (see indirect parameters).
- Results Service - a service that will "listen" for the results of the execution.
Description of interaction steps
- The invoking process calls the init method of the CommonExecutionConnector
interface, which is implemented by the component known as the CommonExecutionController.
This will set up the execution environment for the the application and will return
immediately
with
an executionID
which
is
the
identifier by which the CommonExecutionController keeps track of
this
particular execution instance. The parameters to this call are
- A Tool object - This is described in more detail below.
- JobIdentifier - this is the identifier by which the invoking process
uses to keep track of this particular execution instance. This allows the invoking process to use its own book-keeping methods for tracking, rather than being forced into a particular scheme dictated by the CommonExecutionController.
- the invoking process then has the opportunity to register two classes of listener
- Status Monitor - this is the endpoint of the service that implements
the JobMonitor interface that the ExecutionController can call
to inform the monitoring process of the status of the execution instance.
- Results Listener - this is the endpoint of a service that implements the ResultsListener port so that the ExecutionController can report the results of the application execution once they are ready
- Then the execute operation should be invoked and the CommonExecutionController will then start the application.
- The application can then optionally return status information to the CommonExecutionController
which will then pass this on to the Monitor Service.
- When the application completes it will inform the CommonExecutionController
which will then pass the indirect results on to the storage service, the direct results back to any results listeners and inform the
monitor service that the application has finished.
Some point of note;
- The monitoring/resultListening services could equally be the same as the invoking
service - they are shown as conceptually separate, as the endpoint of
this service is passed in as an argument to the registering call. Indeed if required there could be many status and results listeners for a single application execution.
- The only guaranteed status message that the monitoring service
will receive is the one informing it that the application has finished
(or failed). The application might be capable of sending intermediate
messages whilst it is sill executing, but this is not required.
- The results of the application are not necessarily returned directly
to the invoking process. For "indirect" output parameters, the final destination for the result data is
implicit in the specification of the output parameters, and it is the
responsibility of the ExecutionController to ensure that they get
to the desired storage service.
- The results will also always be passed to the resultsListener if registered. In the case of an indirect parameter, then only the URI that specifies the location will returned, otherwise the full value will be returned. In the case of an indirect parameter the desired location will ususally have been passed in during the init call, however, it is permissible for the CommonExecutionController to change that value for potential optimization purposes.
Application Phases
The interaction diagram above indicates that the application goes through a number of phases detailed below.
- INITIALIZING
- The first phase - this is where a job is being set up and the parameters being checked.
- PENDING
- An application has been accepted for execution but is waiting in a queue.
- RUNNING
- An application is running.
- COMPLETED
- An application has completed successfully.
- HELD
- The application is HELD awaiting execution but will not automatically be executed (cf pending) - further action needed.
- SUSPENDED
- The application has been suspended by the system during execution.
- ERROR
- Some form of error has occured.
These are reported by status messages in the JobMonitor callback
-
2.2.2 Detailed Interface Descriptions
CommonExecutionConnector
See Appendix A1 for source.
This is the main port that is used to communicate with the
application. The main operations in this port are;
- init - this will initialize the application environment - returns and executionId by which
- registerResultsListener - any number of services can register themselves as wanting to receive the results from the run when they are available as long as they implement the ResultsListener port below
- registerProgressListener - any number of services can register themselves as wanting to receive status messages during the run as long as they implement the JobMonitor port below
- execute - will actually start the asynchonous execution of the application specified in the init call.
- queryExecutionStatus - this call can be used to actively obtain the execution status of a running application, rather than passively waiting for it as a JobMonitor
- abort - will attempt to abort the execution of an application
- getExecutionSummary - request summary information about the application execution
- getResults - actively request the results of the application execution, rather than passively waiting for them as a ResultsListener.
- returnRegistryEntry - this returns the registry entry for the particular CommonExecutionConnector instance - this will probably be removed from this interface to be replaced by the equivalent operation in the standard VO service definitions.
JobMonitor
See Appendix A2 for source.
The only operation is the JobmMonitor port is the monitorJob operation, which expects to receive a message with the job-identifier-type (as specified in the original init operation of the CommonExectutionConnector port) and a status message.
The status message can be used to indicate the phase that the application has reached.
ResultsListener
See Appendix A3 for source.
The only operation is the putResults on the ResultsListener port. This accepts a message that contains a job-identifier-type and a result-list-type, which is just a list of parameterValues.
2.3 CEA in the Registry - VOResource Extension
It is a valid question to ask whether there needed to be a specific VOResource
extension to accommodate the CEA. The standard Service
element expects the interface to the service to be described in WSDL,
so given that CEA has constant WSDL definitions for different applications
there needs to be a way of expressing the fact that a particular CeaService can
run a particular set of applications. The method that was chosen was to extend
Service with an element that is just an aggregation
of pointers to the actual application defintions defined in CeaApplication which
is an extension of the standard Resource.
These relationships are illustrated in the UML below.

For a particular application there should be only one CeaApplication entry
in the registry. This entry will define everything that is necessary to run
the application except for the endpoint of the service. This implies that to
find a particular instance of a particular application is a two stage registry
query.
- Query the registry to find the application of interest - note the parameter
data and the IVOA identifier for the application.
- Query a second time to find the CeaService(s) that can run the application
with that IVOA identifier.
The diagram illustrates
the point that one CeaService may
run several CeaApplications
and that a particular CeaApplication can be
run by several CeaServices.
3 Deployment
Typical Scenario
This deployment shows some of the features of using the CEA (as implemented in AstroGrid)
- On the right hand side of the diagram there are command line
applications that are wrapped by specialized CommonExecutionControllers
that allow the workflow engine to use the CommonExecutionConnector
interface to communicate.
- There is a webservices proxy component that can act as an adaptor
between a generic web service and the CommonExecutionConnector interface, providing a uniform view on any sort of application.
- On the left of the diagram the webservices proxy is localised
with a web service so that the results returned by the webservice can
be stored locally on a VOStore thus minimising network traffic. This also allows a way of 'VO Enabling' and existing web service, without altering the service itself.
Minimum conditions for CEA compliance
Below the general conditions for an application service to be considered CEA compliant are laid out. These conditions should be considered a minimum set
- It must implement the CommonExecutionConnector interface.
- It should send messages to services that implement the JobMonitor interface. The application should send at least the message indicating that it has finished, but other messages could be useful also.
- It must send a message to services that implement the ResultsListener interface. If it does not send this message then it will not be implementing an asynchronous behaviour.
- It should be able to perform basic type checking on all parameter types during the init phase.
- It must able to support all the indirect transport mechanisms for output parameters.
There are details of the AstroGrid implementation of a set of interacting components that use the Common Execution Architecture given in Appendix C.
4 Future Directions
It is clear that the scope of the CEA is very wide and as such provides a full methodology to define and execute applications in an asynchronous web services environment. As such it can be considered complete in that it has been used within astrogrid to implement a full workflow system. However, with the ongoing work to bring about a convergence between the grid and web services [OGSIWSRF] it might be best to refactor the asychonous calling part of this specification in terms of the WS-Resource and WS-Notification standards in order to reap the benefits of general tools that will become available that might support these standards. A suggestion as to how this might be done has been presented in the Universal worker service proposal [UWS].
In the future it might be the case that CEA describes a "profile" of how to use some of these "low level" specifications within the context of the application model described here, and as such has
Extensions
There are specific extensions to the overall model that could be considered in addition to the reworking mentioned in the previous section. These include;
- Use work from the DM workgroup on basic parameter types. This could take two forms
- perhaps
extend the number of aggregate types that CEA "understands".
- perhaps recast the ParameterValue type to be more complex e.g. being a subtype of Quantity. This would need some very careful consideration though, as it was one of the fundamental design decisions of CEA to make the parameter value a simple string to allow easy integration with as many processing systems as possible. There would need to be substantial benfits in other areas for the introduction of complex schema types as this level to be worth the extra implementation costs.
- perhaps allow parameters to be array valued. At the moment this sort of behaviour can be modeled using repeatable parameters
- Make other transport mechanisms mandatory for indirect
parameters e.g.
- Possibly introduce more fault types into the WSDL interface definitions so that the nature of the fault can be determined simply from the fault type rather than having to look at the internal details of the CEAFault.
- Think about capability/ontology. This is particularly in relation to the registry schema extensions, where it would be beneficial to include more information that could be used by intelligent agents that specifies in general the class of processing that an application might perform. e.g. photometric redshift estimation.
- A new definition of CEA Control - this interface would define the functions necessary for a Job control system to make sensible decisions about resource management, as well as allowing an Execution Controller to possibly make statements about the characteristics (maximum cpu time) of job that it is prepared to run.
The following files are associated with this specification
| Filename (with link) |
Namespace |
includes (only unique part of namespace listed) |
Description |
| AGApplicationBase.xsd |
http://www.ivoa.net/xml/CEA/base/v0.2 |
parameters |
This schema defines most of the basic CEA objects that are imported into both the WSDL and the Registry Schema |
| CEATypes.xsd |
http://www.ivoa.net/xml/CEA/types/v0.2 |
parameters |
This defines the the message types that are passed in queryStatus operations in the CommonExecutionConnector interface and in the MonitorJob operation of the Job Monitor interface.
|
| VOCEA.xsd |
http://www.ivoa.net/xml/CEAService/v0.2 |
base, parameters
|
This defines the VOResource extensions of CeaApplication and CeaService that are used in the registry |
| AGParameterDefinition.xsd |
http://www.ivoa.net/xml/CEA/parameters/v0.2 |
|
Contains the basic parameter definition and parameterValue elements used in the other schema |
| ExecutionRecord.xsd |
http://www.ivoa.net/xml/CEA/ExecutionRecord/v0.2 |
types |
Definition of messages sent in the callbacks |
| CommonExecutionConnnector.wsdl |
|
base,types |
The main interface definition |
| JobMonitor.wsdl |
|
ExecutionRecord |
The job monitor listener interface defintion |
| CEAResultsListener.wsdl |
|
types |
The results listener interface definition
|
This is an example of the registry entry for a CEAApplication and the CEAService that can run that application.
Note - to be able to validate this document the VOCEA.xsd needs to be included into the RegistryInterface schema.
Appendix C: Details of the Astrogrid Implementation
The CEA is implemented in the following astrogrid components
- Applications Integration [AGAPP]. This currently implements a specialized CommonExecutionController that can execute unix command line applications, as well as a proxy for calling legacy HTML form based web applications, and the framework to write a conforming application directly in Java.
- Workflow Common Objects [AGWFO]. This project holds all of the schema and WSDL definitions that are used by CEA based services in Astrogrid. Additionally it contains Castor generated object bindings for the schema and the Axis generated web services stubs for the service.
- Job Execution System [AGJES]. This is the engine of the Astrogrid workflow which orchestrates calls to the various applications.
There are now dozens of applications that are confom to this framework published by Astrogrid, including well known legacy applications such as SExtractor and HyperZ. A list of these applications can be obtained by querying a registry for entries with @xsi:type = 'cea:CeaApplication'. Such a query on the the astrogrid registry can be perfomed with this link .
- AGAPP
- The Astrogrid maven documentation for the Applications integration component (http://www.astrogrid.org/maven/docs/HEAD/applications/index.html)
- AGJES
- The Astrogrid maven documentation for the Job Execution System (http://www.astrogrid.org/maven/docs/HEAD/jes/index.html)
- AGWFO
- The Astrogrid maven documentation for the Workflow Objects component (http://www.astrogrid.org/maven/docs/HEAD/astrogrid-workflow-objects/index.html)
- OGSIWSRF
- From Open Grid Services Infrastructure to WS-Resource Framework: Refactoring and Evolution describes how OGSI constructs map to WS-Resource Framework constructs. (http://www.globus.org/wsrf/specs/ogsi_to_wsrf_1.0.pdf)
- UWS
- Universal Worker Proposal - Guy Rixon (http://www.ivoa.net/internal/IVOA/IvoaGridAndWebServices/uws.html)
- WSI
- Web Services Interoperability Guidelines (http://www.ws-i.org/)
- VOTable
- Ochsenbein, F. et al., VOTable Format Definition V1.1, http://www.ivoa.net/Documents/latest/VOT.html
- FITS
- IAU Working group http://fits.gsfc.nasa.gov/iaufwg/iaufwg.html and general documentation http://fits.gsfc.nasa.gov/fits_documentation.html
- ADQL
- IVOA Astronomical Data Query Language http://www.ivoa.net/internal/IVOA/IvoaVOQL/ADQL-0.91.pdf
- STC-S
- Space-Time Coordinate (STC) Metadata Linear String Implementation http://www.ivoa.net/Documents/latest/STC-S.html
-
Changes from version 0.1
The main changes from the previous version of this document that is published on the AstroGrid site are
- Schema repackaged to separate CEA components from Astrogrid Workflow.
- Schema reformulated to work with v10 registry schema.
- Text reordered and expanded to help clarify overall structure.