Changes to VOSpace specification

Dave Morris dave at ast.cam.ac.uk
Sun Nov 26 21:18:05 PST 2006


Dave Morris wrote:

> Matthew Graham wrote:
>
>> 3. Decoupled data servers
>
>> .....
>> This is actually the only transfer method which needs a modification: 
>> all the others work fine with decoupled servers.
>
> No transfer methods need modification.
> You can achieve the same effect using a pullToVospace call instead of 
> pushToVospace.

Actually, this is wrong (I was still thinking in terms of version-1.0 
not version-1.+).

As Paul points out in another email :
"we cannot brush the asynchronicity of this call under the carpet"

I agree.
We already have two asynchronous calls in VOSpace-1.0, and no way to 
manage the implied state on the server.

The pushToVospace and pullFromVospace methods both initiate transfers 
that will happen in the future, which implies setting up something on 
the server to handle them.
But, we don't have any way of referring to new state information created 
on the server.

I wasn't keen on making the other two import and export methods 
asynchronous - until we had a way of referring to, and managing, the 
transfer state.
Once we have that mechanism in place, then we can go ahead and make all 
of the transfer methods asynchronous.
As Matthew has highlighted, without it, we are creating state 
information on the server that the client can't reach.

Now that we are opening up discussion about a new version of the spec. 
this might be a good time to bring up a couple of suggestions I made in 
September.

    http://wiki.astrogrid.org/bin/view/Astrogrid/VoSpace20060904

        Vospace version-1.1 proposal
        Section 2.3 - asynchronous transfers

Paul added some notes when he saw them in September, and since then I 
have re-evaluated some of the ideas in light of his comments.
So, the details in these documents are already out of date, but (I hope) 
the general idea is still sound.

Basically, we need something (an object) to represent the state of a 
transfer.
We could create a new set of objects, methods, service WSDL and schema 
to handle the new status object(s).

However, we already have client and server components for querying and 
modifying objects (nodes) on the VOSpace server.
In which case, can we represent the transfer state as a node, with child 
nodes for each of the protocol options ?
This would enable us to query and manage the state of a transfer without 
having to invent a completely new set of objects and service API.

So, when I said this in my previous email :
 >
 > No transfer methods need modification.
 > You can achieve the same effect using a pullToVospace call instead of 
pushToVospace.
 >
I was wrong, they do need modification to support asynchronous transfers 
properly.
All four of the import and export methods need to return something that 
refers to the state information created on the server.

If the status is represented as a VOSpace node, then the the import and 
export methods could either return a simple "vos://..." identifier of 
the status node, or the full status node element.

So where at the moment we have :

    import response
        <!-- The updated node -->
        <node uri="vos://.....">
            .....
        </node>
        <!-- Transfer details -->
        <transfer>
            <view ..../>
            <protocols>
                .....
            </protocols>
        </transfer>

This would change to :

    import response
        <!-- The updated node -->
        <node uri="vos://.....">
            .....
        </node>
        <!-- The transfer status node -->
        <node uri="vos://.....">
            .....
        </node>

In effect, replacing the current transfer details node in the response 
with a status node.
We would still be returning all the same information, but in a different 
wrapper.

The new status node would contain the same information as the current 
transfer details, including the target view (as a property of the 
transfer node) and the list of the protocol options (as child nodes of 
the transfer). However, representing the information as nodes in the 
VOSpace service means that it remains persistent after the end of the 
initial SOAP call. This gives us something that the client and server 
can use to refer to the state information later on.

The client can use the "vos://....." URI of a status node to update the 
state, either by manipulating the status node properties, or by using a 
new set of methods specifically for updating transfers, e.g. complete(), 
fail() and cancel().

This part of the specification wouldn't mandate _what_ the client should 
have to do with the status node once it has been given it. It just gives 
the client and server a common way of referring to the status of that 
particular transfer.

As Matthew described, some protocols may complete without requiring a 
notification callback from the client, e.g a HTTP put to a servlet 
within the VOSpace service. In which case, the status node just provides 
the client, or a 3rd party, with a way of checking if the transfer has 
been completed yet.

Other protocols will require some form of callback.
In Matthews example, if the protocol involves a put to a gFTP server 
followed by an 'adoption' step where the VOSpace server updates its 
metadata to include the uploaded file, then the client may have to tell 
the VOSpace server when the data is ready.

The client could use the "vos://...." URI of the transfer status in the 
callback, to tell the server which transfer (and protocol option) it is 
talking about. We need to remember that the VOSpace server may have 
offered more than one protocol option for the transfer, so the client 
needs to tell it which option has been completed, to enable the server 
to collect the data from the right place and cancel the others.

The details of what the callback means, and what the server does with 
it, would be specific to the implementation of the protocol.
If the VOSpace and gFTP server are acting as one entity, then the 
VOSpace server may leave the data within the gFTP server file system, 
and just update the node metadata.
On the other hand, if the gFTP server is acting as a staging post, then 
the VOSpace server may collect the data from the back end and move it to 
another location within its own file system.

In summary :

Mathew has highlighted the fact that we already need a callback 
mechanism for some of the existing import and export protocols.

Whatever callback mechanism we adopt, it will need some way to refer to 
the persistent state within the VOSpace server, that represents the 
state of the transfer and the individual protocol options within it. 
Representing these as VOSpace nodes means that we can use the existing 
"vos://..." URI scheme to refer to them, and the existing API to list, 
query and modify them.

Once we have a standard way of referring to the persistent state of a 
transfer, then my previous email about making the details of the 
callback specific to the protocol might make sense. Without it, the 
client has no way of telling the server which transfer and protocol 
option it is talking about.

Hope some of this may be useful,
Dave



More information about the vospace mailing list