Re: Changes to VOSpace specification

From: Dave Morris <dave-at-ast.cam.ac.uk>
Date: Tue, 28 Nov 2006 03:21:53 +0000


Paul Harrison wrote:

> On 27.11.2006, at 06:18, Dave Morris wrote:
>
>> The details of what the callback means, and what the server does
>> with it, would be specific to the implementation of the protocol.
>> If the VOSpace and gFTP server are acting as one entity, then the
>> VOSpace server may leave the data within the gFTP server file
>> system, and just update the node metadata.
>> On the other hand, if the gFTP server is acting as a staging post,
>> then the VOSpace server may collect the data from the back end and
>> move it to another location within its own file system.
>>
>> In summary :
>>
>> Mathew has highlighted the fact that we already need a callback
>> mechanism for some of the existing import and export protocols.
>
> And this does has a bearing on the delivery of a 1.0 standard if we
> are to promise backward compatibility.....

Only if you want to do the complicated side of things. If you have an image archive in a public ftp server, then the files stay where they are and the URLs don't change, so the VOSpace server does not have to store any state for a transfer.

For read access, we would only need to store state if we wanted to do things like one-time access URLs with cookies encoded in them. Even then, if we are using a servlet integrated with the VOSpace service the servlet will know when the transfer has been completed so we don't need a callback.

For write access, then using a servlet integrated with the VOSpace service means that we don't need callbacks, because the servlet knows when the transfer ends.
Write access to an external GridFtp server, which is what Matthew initially raised, is a little complicated to achieve, but still possible.

I'm not saying the current spec does everything we need. Just that it does enough for now.

If we delay the spec to add the asynchronous callbacks, then

  1. we will never get it out the door because new thing will always come up, and
  2. we will be under pressure to get the specification signed off, so we will not be as careful as we should be about how we implement the callbacks.

We have some good ideas for the next version, but barring major show stoppers, lets leave 1.0 as it is, and get it through the process.

>> Once we have a standard way of referring to the persistent state of
>> a transfer, then my previous email about making the details of the
>> callback specific to the protocol might make sense. Without it, the
>> client has no way of telling the server which transfer and protocol
>> option it is talking about.
>
> I think that we should try to extract as much common protocol
> behaviour as possible - I think that as soon as a protocol is not
> completely described by the transfer URL we get into complications
> that would be better to avoid to maintain interoperability - we need
> to utilise as much of the common characteristics of a protocol as
> possible before layering what are non-standard protocol behaviours
> on top of externally defined protocols

I think we might be using the same word to talk about two different things. Protocol can mean different things depending on what layer you are looking at.
HTTP and SMTP are protocols, but so is SOAP, and you can layer one protocol on top of another, giving you SOAP via HTTP and SOAP via SMTP, or even SOAP via Jabber.
In addition, SOAP itself has a number of sequence variations, call-only, call and respond, call and call-back etc. - I can't remember the acronyms for these.

I think perhaps we need to find some different words to describe the parts that we have.

First, at the bottom layer is the actual transport-protocol on the network, http, ftp etc.

On top of that we have the way that we use the transport-protocol, call it the transfer-method.
This describes things like what authentication is required and how we want to notify the service that the transfer has completed. These are things we will have to define ourselves, and register descriptions in the registry.

In all of my examples I have tended to use an abbreviated form of notation :

    <protocol uri="[http-put]">
because the full registry URIs would make things very complicated for a human to read. This isn't a problem in the live system, because a human should never have to write or edit them. But in the examples I needed to use short abbreviations to make things easier to read.

So replacing 'protocol' with 'transfer-method' the full expanded form would be:

    <transfer-method uri="ivo://net.ivoa.vospace/transfer-methods/http-put">

Where the transfer-method URI points to a registration document that says something like this :

ivo://net.ivoa.vospace/transfer-methods/http-put

    This is a VOSpace transfer method that uses the standard http-put transport protocol, with chunked data encoding.

    On a call to pushToVoSpace, the service returns a standard http://.. URL which the client should send the data to.

    The client sends the data to the URL using the HTTP-1.1 PUT transport protocol.

    Using chunked-data encoding means that the client does not need to send the content-length header field at the start of the transfer.

    The transfer completes automatically when the client closes the HTTP connection, and no callback is required to complete the transfer.

    Note - any service that offers this as a protocol option must ensure that it can receive data sent using the chunked data encoding.

A VOSpace service could implement this using a Java Servlet integrated within the VOSpace service. There is no need to have a callback because the Servlet will know when the transfer finishes, and it can update the VOSpace metadata internally. The client sends the data and then forgets it, the server side takes care of updating everything.

However, if someone wants to use a separate Apache web server to receive the data, then they will either have to modify the Apache server, or they will have to use a more complex transfer-method.

    <transfer-method
uri="ivo://net.ivoa.vospace/transfer-methods/http-put-callback-1.3">

The transfer-method URI refers to a registration document that says something like this :

ivo://net.ivoa.vospace/transfer-methods/http-put-callback-1.3

    This is a VOSpace transfer method that uses the standard http-put transport protocol, with chunked data encoding.

    ....
    When the client has finished sending the data, it must notify the VOSpace service via a callback API.

    If no callback has been received within the time limit, then the service may cancel the transfer and remove and temporary files.

    The details of the callback service API is defined in [this] document.     The WSDL and schema for the callback WebService are [here] and [here].

We may want to add some general purpose callback methods to the next version of the VOSpace API.
It looks like completed(), failed() and canceled() operations on the status nodes are good candidates for this. In which case, a new version of the [http-put-callback] transfer method could be defined as follows :

ivo://net.ivoa.vospace/transfer-methods/http-put-callback-1.4

    This is a VOSpace transfer method that uses the standard http-put transport protocol.

    ....
    When the client has finished sending the data, it must notify the VOSpace service using the completed() or failed() methods defined in the VOSpace-1.1 specification.

    If no callback has been received within the time limit, then the service may cancel the transfer and remove and temporary files.

Sorry for being long winded about this, but I think you have highlighted an important point of confusion, between ourselves and in our presentation of this to others, so it would be useful to work through the details and get things right.

Three things to note about the above descriptions. 1) Both transfer-methods use the same underlying transport-protocol, so the transport layer endpoint URLs will look identical, so we can't use the transport layer URLs to distinguish between the different transfer-methods.

2) Each transfer-method has a unique URI that points to a full description of the transfer-method. That was one of the original reasons for putting them in the registry, we get a unique identifier for each one, and a common way of resolving the URI into a description.

3) The description of the [http-put-callback] transfer method does not need to be part of the VOSpace service specification. It doesn't even need to use the standard VOSpace callback mechanism.

This means that you could define a new transfer-method for moving things around within ESO, using the [ngas-replication] method. Underneath, it might use the http transport-protocol to move the bytes, and it might use calls to the VOSpace-1.1 status callback as part of the process, but you could add any additional steps or notifications required to update the NGAS system in the transfer method description :

ivo://org.eso.vospace/transfer-methods/ngas-replication-2.0

    This is a VOSpace transfer method for use by the internal NGAS systems within ESO.

    The transfer method uses the standard http-put transport protocol to move the data.

    The sending client must use the completed() or failed() methods defined in the VOSpace-1.1 specification to notify the service when the transfer has finished.

    In addition, the client needs to call the xyz() method on the NGAS system to authenticate and receive a transfer token.

We will start with a set of commonly used transfer-methods, defined in the ivo://net.ivoa.vospace/ registry, which will cover the plain vanilla uses of the core protocols with no callback, like [http-get], [http-put], [ftp-get] and [ftp-put] etc.

Once we have a way of referring to and manipulating server side state (status nodes), then we can add the completed(), failed() and canceled() callback methods to the VOSpace specification. Once we have the standard callback methods, then we can define the callback versions of [http-put-callback], [ftp-put-callback] and [gftp-callback] etc.

In the mean time, 3rd party developers can implement the standard [http-get] and [http-put] transfer methods, with no callback required, by using Java Servlets integrated into the same webapp as the VOSpace service. We only start to need callbacks when we are trying to import data into 3rd party servers like an Apache or GridFtp server co-located but not integrated with the VOSpace service itself. Even that isn't impossible, just a little more complicated.

Again, apologies for going into so much detail, but I think we are actually fairly close to getting a lot of this solved. The stopping points seem to be when I/we use words like 'view' and 'protocol' to mean different things, and we get side tracked.

So, does the distinction between 'transfer-method' and 'transport-protocol' make sense ?
If so, are 'transfer-method' and 'transport-protocol' good replacements for the more generic term 'protocol' ?

In your comment, waaay back up there somewhere, you said  > ... as soon as a protocol is not completely described by the transfer URL we get into complications

Unfortunately, the endpoint URL can't be used to describe the transfer-method.
There is no way to tell from a transfer-protocol URL if it the transfer-method it is being used in requires a callback or not (which is where this thread started).

On the other hand, our SOAP messages use the transfer-method URI to refer to transport-method, and this does resolve to a resource that describes all the details of the method. So, for brevity in descriptions and examples I may use this :

    <transfer-method uri="[http-put-callback]"/>

When I actually mean this :

    <transfer-method
uri="ivo://net.ivoa.vospace/transfer-methods/http-put-callback-1.3"/>

which should be resolvable to a full description of the transfer-method, including details of how it uses the underlying transport-protocol and the API of any callback methods it requires.

Thank you for reading this far.
Dave Received on 2006-11-28Z04:22:36