Changes to VOSpace specification
Dave Morris
dave at ast.cam.ac.uk
Sun Nov 26 21:18:05 PST 2006
Dave Morris wrote:
> Matthew Graham wrote:
>
>> 3. Decoupled data servers
>
>> .....
>> This is actually the only transfer method which needs a modification:
>> all the others work fine with decoupled servers.
>
> No transfer methods need modification.
> You can achieve the same effect using a pullToVospace call instead of
> pushToVospace.
Actually, this is wrong (I was still thinking in terms of version-1.0
not version-1.+).
As Paul points out in another email :
"we cannot brush the asynchronicity of this call under the carpet"
I agree.
We already have two asynchronous calls in VOSpace-1.0, and no way to
manage the implied state on the server.
The pushToVospace and pullFromVospace methods both initiate transfers
that will happen in the future, which implies setting up something on
the server to handle them.
But, we don't have any way of referring to new state information created
on the server.
I wasn't keen on making the other two import and export methods
asynchronous - until we had a way of referring to, and managing, the
transfer state.
Once we have that mechanism in place, then we can go ahead and make all
of the transfer methods asynchronous.
As Matthew has highlighted, without it, we are creating state
information on the server that the client can't reach.
Now that we are opening up discussion about a new version of the spec.
this might be a good time to bring up a couple of suggestions I made in
September.
http://wiki.astrogrid.org/bin/view/Astrogrid/VoSpace20060904
Vospace version-1.1 proposal
Section 2.3 - asynchronous transfers
Paul added some notes when he saw them in September, and since then I
have re-evaluated some of the ideas in light of his comments.
So, the details in these documents are already out of date, but (I hope)
the general idea is still sound.
Basically, we need something (an object) to represent the state of a
transfer.
We could create a new set of objects, methods, service WSDL and schema
to handle the new status object(s).
However, we already have client and server components for querying and
modifying objects (nodes) on the VOSpace server.
In which case, can we represent the transfer state as a node, with child
nodes for each of the protocol options ?
This would enable us to query and manage the state of a transfer without
having to invent a completely new set of objects and service API.
So, when I said this in my previous email :
>
> No transfer methods need modification.
> You can achieve the same effect using a pullToVospace call instead of
pushToVospace.
>
I was wrong, they do need modification to support asynchronous transfers
properly.
All four of the import and export methods need to return something that
refers to the state information created on the server.
If the status is represented as a VOSpace node, then the the import and
export methods could either return a simple "vos://..." identifier of
the status node, or the full status node element.
So where at the moment we have :
import response
<!-- The updated node -->
<node uri="vos://.....">
.....
</node>
<!-- Transfer details -->
<transfer>
<view ..../>
<protocols>
.....
</protocols>
</transfer>
This would change to :
import response
<!-- The updated node -->
<node uri="vos://.....">
.....
</node>
<!-- The transfer status node -->
<node uri="vos://.....">
.....
</node>
In effect, replacing the current transfer details node in the response
with a status node.
We would still be returning all the same information, but in a different
wrapper.
The new status node would contain the same information as the current
transfer details, including the target view (as a property of the
transfer node) and the list of the protocol options (as child nodes of
the transfer). However, representing the information as nodes in the
VOSpace service means that it remains persistent after the end of the
initial SOAP call. This gives us something that the client and server
can use to refer to the state information later on.
The client can use the "vos://....." URI of a status node to update the
state, either by manipulating the status node properties, or by using a
new set of methods specifically for updating transfers, e.g. complete(),
fail() and cancel().
This part of the specification wouldn't mandate _what_ the client should
have to do with the status node once it has been given it. It just gives
the client and server a common way of referring to the status of that
particular transfer.
As Matthew described, some protocols may complete without requiring a
notification callback from the client, e.g a HTTP put to a servlet
within the VOSpace service. In which case, the status node just provides
the client, or a 3rd party, with a way of checking if the transfer has
been completed yet.
Other protocols will require some form of callback.
In Matthews example, if the protocol involves a put to a gFTP server
followed by an 'adoption' step where the VOSpace server updates its
metadata to include the uploaded file, then the client may have to tell
the VOSpace server when the data is ready.
The client could use the "vos://...." URI of the transfer status in the
callback, to tell the server which transfer (and protocol option) it is
talking about. We need to remember that the VOSpace server may have
offered more than one protocol option for the transfer, so the client
needs to tell it which option has been completed, to enable the server
to collect the data from the right place and cancel the others.
The details of what the callback means, and what the server does with
it, would be specific to the implementation of the protocol.
If the VOSpace and gFTP server are acting as one entity, then the
VOSpace server may leave the data within the gFTP server file system,
and just update the node metadata.
On the other hand, if the gFTP server is acting as a staging post, then
the VOSpace server may collect the data from the back end and move it to
another location within its own file system.
In summary :
Mathew has highlighted the fact that we already need a callback
mechanism for some of the existing import and export protocols.
Whatever callback mechanism we adopt, it will need some way to refer to
the persistent state within the VOSpace server, that represents the
state of the transfer and the individual protocol options within it.
Representing these as VOSpace nodes means that we can use the existing
"vos://..." URI scheme to refer to them, and the existing API to list,
query and modify them.
Once we have a standard way of referring to the persistent state of a
transfer, then my previous email about making the details of the
callback specific to the protocol might make sense. Without it, the
client has no way of telling the server which transfer and protocol
option it is talking about.
Hope some of this may be useful,
Dave
More information about the vospace
mailing list