i v o a . n e t

/ WebHome / IvoaGridAndWebServices / IVOA.AsynchronousActivityProposal

This Web


Garching Interop 

WebHome 
WebChanges 
WebTopicList 
WebStatistics

All Webs


Astrodata
IVOA
Know
Sandbox
TWiki
Trash

TWiki intro
TWiki tutorial
User registration
Notify me

W/I Groups


Data Access
Data Model
GWS
Query Language
Registry
Stds&Procs
Semantics
VOEvent
VOTable

Applications
Astro-RG
Data Curation
Theory

ivoa.net


www.ivoa.net
Doc. Repository
Events
Projects
XML Schema

Proposed standard for asychronous activities on web services

Context

Currently, all our IVOA services run synchronously. I.e., they do what is requested of them during a single HTTP transaction. This is nice and simple but doesn't scale well to long-running activities. Such an activity might be

In any of these cases, the system is stressed once the actvity lasts longer than a few minutes and unreasonably fragile if the activity lasts longer than a few hours. With synchronous operations, all the entities in the chain of command -- client, workflow engine, broker, processing services -- have to stay up for the duration of the activity. If any one is restarted then the context of the activity is lost and the job has to be restarted from the beginning.

It is critical that we allow selected activities to run asynchronously. By this, I mean that a web-service operation starts the activity and completes as soon as the service accepts the job. The activity continues outside the scope of any particular web-service operation. The client of the activity can find out from the service the state of the work; possibly, the service notifies the client asynchronously to avoid polling. The client has some way of keeping track of the job and of recovering this information without needing to stay connected to the service. Finally, there should be a system for cleanin up resources used by the activity. I have described this kind of activity as part of the planning of the AstroGrid-2 project.

Clearly, we can build an ad-hoc solution for each kind of service, or even for each implementation of each kind of service. Equally clearly, it's more efficient in programming resources to have a standard solution and a reference implementation.

Proposal

I suggest that we need

The draft standards in the WS-ResourceFramework (WS-RF) family address all these issues and I propose that we base our asynchronous-activity convention on WS-RF. Some details of the proposed usage are listed below.

The context of an activity is identified by a resource identifier from WS-RF. An identifier is an opaque, unique token.

When an activity starts, the web-service operation that starts it ('operation' in the WSDL sense) returns the identifier for the resource.

A client associates a subsequent operation with the activity (e.g. an enquiry or 'abort' command) by passing the resource ID in the SOAP header, as described by WS-RF. The ID is carried inside an endpoint-reference structure as defined by WS-Addressing.

Activities have timeouts. If an activity times out, then the service running it clears away all its state metadata and any locally-cached results. I.e. the client loses the output from a timed-out activity but the service reclaims its resources. The time-out on an activity is independent of any time-out of the work done by the activity. The activity time-out governs the time for which the client can access the state and results of the work; this is typically longer than the time-out for the work itself. Consider a batch job in a queue with a run-time limit of 30 minutes. The work timeout is 30 minutes, but the period for which the service retains the results will be longer, possible a day or more.

The controls for the timeout on an activity are as specified in WS-ResourceLifetime. I.e., the service implements a port-type that controls the activity's life-cycle.

Using this port, a client may end an activity before its time-out. A client may also increase or reduce the time-out period; but the service may reject some values of the proposed time-out period.

A client may wait for the end of an activity by calling a 'wait' operation on that activity. This operation is not part of WS-RF, so we must specify the details ourselves.

Services maintain state metadata on their activities. The metadata for an activity must include:

These particular metadata are not specified by WS-RF, so we must choose them ourselves. The activity metadata may include other items.

A service must provide operations for acccessing the state metadata. These must be implemented according to the WS-ResourceProperties standard.

A client may subscribe to state metadata of an activity, as detailed in WS-BaseNotification. A subscribed client, which must itself be a web service, receives asynchronous notification of metadata changes. A service need not support notification, but if it does so it must support it as described by WS-BaseNotification.

When a service implements these interfaces, it is promising to maintain its activities as persistent resources: i.e. persistent across restarts of the service. A restarting service should restart all its activities where they were interrupted. If this is not possible, then the service must mark those activities as aborted in their state metadata. A service must not forget activities when it restart or delete the results of those activities, unless the activities have timed out while the service was down.

A service implementing the feature above must implement an 'activity' port-type which aggregates:

Concentrating these features into one port makes it easier to generate client stubs. The service may build these features into another port-type that provides additional operations, but the service must not disperse the features among many port-types.

In summary, a service conforming to this proposed standard must:

Areas that need specification in more detail are:

Why WS-RF?

We would do better to adopt an existing standard than to define our own from scratch. This aspect of the VObs isn't at all specific to astronomy, so there is no need for a tailored standard. By using an existing standard, we get the possibilities of using external implementations and of easy interoperation with externally-written services. The question is, which standard?

There are several frameworks for asynchronous activities that are quasi-standard.

OGSI is a GGF standard but is now deprecated in favour of WS-RF. WS-GAF is a private experiment produced by the Unversity of Newcastle; it is composed of other, simpler web-service standards. WS-RF, WS-Coordination and WS-CAF are each the subject of an OASIS technical-committee.

It seems that any of these frameworks could satisfy our requirements; they all provide the 'plumbing' from which we can build our IVOA conventions.

WS-GAF isn't a standard. Currently, there are no WS-GAF products to reuse.

OGSI has several implementations as libraries and a few services that we could re-use. However, OGSI is deprecated and the supported services using OGSI will migrate soon to WS-RF.

WS-Coordination and WS-CAF aren't used in either astronomy or grid computing as far as I know. Therefore, there are no complete web-services to re-use. There are no open-source library implementations of the protocol, either. WS-CAF specifies a complex pattern of agents and operations to manage activities; it would be relatively hard to implement and might not support all the patterns we need. WS-Coordination and WS-CAF do support transactions.

WS-RF does support the patterns we need (with the exception of the 'wait' operation, which can be added). However, WS-RF has no support for transactions. WS-RF is factored into parts that may be implemented separately; this should make it cheaper to support if we cannot get WS-RF libraries. Several academic implementations are in progress and commercial support is promised by IBM, BEA and HP. Most OGSI services (e.g. OGSA-DAI) are expected to be ported to WS-RF in the near future. It now seems likely that implementations of OGSA services will be based on WS-RF (although other frameworks could be used with OGSA).

Using WS-RF gets us easier integration with GGF grid computing. Using WS-Coordination or WS-CAF might get us easier integration with commercial web services. On balance, WS-RF seems more useful to us.

-- GuyRixon - 06 May 2004




 
 
© 2003-2007 by the contributing authors.  - You are TWikiGuest