Changes to VOSpace specification

Paul Harrison pharriso at eso.org
Mon Nov 27 04:04:41 PST 2006


On 27.11.2006, at 06:18, Dave Morris wrote:

> Dave Morris wrote:
>
>> Matthew Graham wrote:
>>
>>> 3. Decoupled data servers
>>
>>> .....
>>> This is actually the only transfer method which needs a  
>>> modification: all the others work fine with decoupled servers.
>>
>> No transfer methods need modification.
>> You can achieve the same effect using a pullToVospace call instead  
>> of pushToVospace.
>
> Actually, this is wrong (I was still thinking in terms of  
> version-1.0 not version-1.+).
>
> As Paul points out in another email :
> "we cannot brush the asynchronicity of this call under the carpet"
>
> I agree.
> We already have two asynchronous calls in VOSpace-1.0, and no way  
> to manage the implied state on the server.
>
> The pushToVospace and pullFromVospace methods both initiate  
> transfers that will happen in the future, which implies setting up  
> something on the server to handle them.
> But, we don't have any way of referring to new state information  
> created on the server.
>
> I wasn't keen on making the other two import and export methods  
> asynchronous - until we had a way of referring to, and managing,  
> the transfer state.
> Once we have that mechanism in place, then we can go ahead and make  
> all of the transfer methods asynchronous.
> As Matthew has highlighted, without it, we are creating state  
> information on the server that the client can't reach.
>
> Now that we are opening up discussion about a new version of the  
> spec. this might be a good time to bring up a couple of suggestions  
> I made in September.
>
>    http://wiki.astrogrid.org/bin/view/Astrogrid/VoSpace20060904
>
>        Vospace version-1.1 proposal
>        Section 2.3 - asynchronous transfers
>
> Paul added some notes when he saw them in September, and since then  
> I have re-evaluated some of the ideas in light of his comments.
> So, the details in these documents are already out of date, but (I  
> hope) the general idea is still sound.
To summarize my objections

* sometimes it is more convenient for both the client and the server  
implementation to have a specific API for operations rather than  
having to manipulate objects using a "general" api - though I do  
recognize that there is some power and flexibility in the general api  
approach cf everything is a file in unix...

* I was not keen on the "control" objects polluting the namespace in  
the sense that as illustrated in the document that they would appear  
directly in listings - I would much prefer that any such control  
objects attached to a data node were only accessible by using the ?  
(query) portion of the vos url - e.g. the full URL to obtain the  
transfer status for a data node could be


vos://org.test!vospace/container/node?transfer

and a list of all pending transfers for the space itself could be  
referred to by such a query on the root node.

>
> Basically, we need something (an object) to represent the state of  
> a transfer.
> We could create a new set of objects, methods, service WSDL and  
> schema to handle the new status object(s).
>
> However, we already have client and server components for querying  
> and modifying objects (nodes) on the VOSpace server.
> In which case, can we represent the transfer state as a node, with  
> child nodes for each of the protocol options ?
> This would enable us to query and manage the state of a transfer  
> without having to invent a completely new set of objects and  
> service API.
>
> So, when I said this in my previous email :
> >
> > No transfer methods need modification.
> > You can achieve the same effect using a pullToVospace call  
> instead of pushToVospace.
> >
> I was wrong, they do need modification to support asynchronous  
> transfers properly.
> All four of the import and export methods need to return something  
> that refers to the state information created on the server.
>
> If the status is represented as a VOSpace node, then the the import  
> and export methods could either return a simple "vos://..."  
> identifier of the status node, or the full status node element.
>
> So where at the moment we have :
>
>    import response
>        <!-- The updated node -->
>        <node uri="vos://.....">
>            .....
>        </node>
>        <!-- Transfer details -->
>        <transfer>
>            <view ..../>
>            <protocols>
>                .....
>            </protocols>
>        </transfer>
>
> This would change to :
>
>    import response
>        <!-- The updated node -->
>        <node uri="vos://.....">
>            .....
>        </node>
>        <!-- The transfer status node -->
>        <node uri="vos://.....">
>            .....
>        </node>
>
> In effect, replacing the current transfer details node in the  
> response with a status node.
> We would still be returning all the same information, but in a  
> different wrapper.
>
> The new status node would contain the same information as the  
> current transfer details, including the target view (as a property  
> of the transfer node) and the list of the protocol options (as  
> child nodes of the transfer). However, representing the information  
> as nodes in the VOSpace service means that it remains persistent  
> after the end of the initial SOAP call. This gives us something  
> that the client and server can use to refer to the state  
> information later on.
>
> The client can use the "vos://....." URI of a status node to update  
> the state, either by manipulating the status node properties, or by  
> using a new set of methods specifically for updating transfers,  
> e.g. complete(), fail() and cancel().
>
> This part of the specification wouldn't mandate _what_ the client  
> should have to do with the status node once it has been given it.  
> It just gives the client and server a common way of referring to  
> the status of that particular transfer.
>
> As Matthew described, some protocols may complete without requiring  
> a notification callback from the client, e.g a HTTP put to a  
> servlet within the VOSpace service. In which case, the status node  
> just provides the client, or a 3rd party, with a way of checking if  
> the transfer has been completed yet.
>
> Other protocols will require some form of callback.
> In Matthews example, if the protocol involves a put to a gFTP  
> server followed by an 'adoption' step where the VOSpace server  
> updates its metadata to include the uploaded file, then the client  
> may have to tell the VOSpace server when the data is ready.
>
> The client could use the "vos://...." URI of the transfer status in  
> the callback, to tell the server which transfer (and protocol  
> option) it is talking about. We need to remember that the VOSpace  
> server may have offered more than one protocol option for the  
> transfer, so the client needs to tell it which option has been  
> completed, to enable the server to collect the data from the right  
> place and cancel the others.
>
> The details of what the callback means, and what the server does  
> with it, would be specific to the implementation of the protocol.
> If the VOSpace and gFTP server are acting as one entity, then the  
> VOSpace server may leave the data within the gFTP server file  
> system, and just update the node metadata.
> On the other hand, if the gFTP server is acting as a staging post,  
> then the VOSpace server may collect the data from the back end and  
> move it to another location within its own file system.
>
> In summary :
>
> Mathew has highlighted the fact that we already need a callback  
> mechanism for some of the existing import and export protocols.

And this does has a bearing on the delivery of a 1.0 standard if we  
are to promise backward compatibility.....
>
> Whatever callback mechanism we adopt, it will need some way to  
> refer to the persistent state within the VOSpace server, that  
> represents the state of the transfer and the individual protocol  
> options within it. Representing these as VOSpace nodes means that  
> we can use the existing "vos://..." URI scheme to refer to them,  
> and the existing API to list, query and modify them.

It might just be a question of teminology, but as I said the idea  
that they are just "ordinary nodes" that appear in the container  
listings, I do not like, however if they are accessible via the query  
part of the URL, I am more amenable. However, although the existing  
api is ok for querying and listing control objects, I am not so sure  
it is that suitable for modifying them - after all,  this whole  
discussion has flared up because of the complexities of the data  
upload within VOSpace - whilst these complexities are acceptable for  
uploading data objects (so that we can take advantage of the special  
qualities of existing data transfer protocols), an specialized api  
might be more suitable for modifying control objects.


>
> Once we have a standard way of referring to the persistent state of  
> a transfer, then my previous email about making the details of the  
> callback specific to the protocol might make sense. Without it, the  
> client has no way of telling the server which transfer and protocol  
> option it is talking about.

I think that we should try to extract as much common protocol  
behaviour  as possible - I think that as soon as a protocol is not  
completely described by the transfer URL we get into complications  
that would be better to avoid to maintain interoperability - we need  
to utilise as much of the common characteristics of a protocol as  
possible  before layering what are non-standard protocol behaviours  
on top of externally defined protocols



More information about the vospace mailing list