Changes to VOSpace 2.0

Dave Morris dave.morris at bristol.ac.uk
Thu Jan 6 04:42:09 PST 2011


Hi all,

I've been out of the loop for a while, so this may no longer be 
relevant, but I think the original intent was that the job state would 
record the success/failure of the whole 'job', the negotiation stage, 
the data transfer itself and any decoding/interpreting of the data when 
it arrived.

On 05/01/11 23:58, Matthew Graham wrote:
> Hi,
>
> On Jan 5, 2011, at 3:41 PM, Patrick Dowler wrote:
>
>> On 2011-01-05 15:03:06 you wrote:
>>> The user submits an HTTP(S) request to /sync which consists of a<transfer>
>>> document detailing a pushTo/pullFrom transfer. The service responds with a
>>> Redirect-GET which returns the Job document for the transfer to the user
>>> with details of the transfer, i.e., endpoints.  At this point, the status
>>> of the Job should be ERROR (if something went wrong) or EXECUTING (since
>>> the endpoints are live and ready for data transfer). The Job status should
>>> only go to COMPLETED when the data transfer has finished - the service
>>> will need to monitor the data streams (or delegate the monitoring) to
>>> determine when this occurs and respond accordingly.
>>
>> Oh, I see. It is me that misunderstood the job state. I thought the job was
>> complete once the negotiation was done, but this is more sophisticated. After
>> the user PUTs the file (for example) the job will then be marked COMPLETED or
>> ERROR depending on whether that succeeds. Got it!
>
> OK, I'm happy with this solution as well.
>

Yep, I think this is similar to how the 1.1 SOAP service operated.

The job state was intended to be a semi permanent record of what 
happened and could be referred to and examined after the transfer had 
finished.

If the negotiation fails, then it should leave behind a job state that 
records how far it got and why it failed. If the negotiation succeeds 
but the data transfer fails, then it leaves behind a job state that 
records how far it got and why it failed.

The job should only be listed as completed when the all data has arrived 
and is in good shape.

For a simple binary blob, then the job completes when all of the bytes 
arrive. For a transfer of a votable into a TAP service, then even if all 
of the bytes arrive, the job could still fail if the votable was not 
valid XML or failed to import into the database.

Several use cases for this.
1) A 3rd party transfer where the client software can negotiate the 
transfer and then delegate the actual PUT to a different system which 
would not necessarily need to understand vospace, just http or ftp. The 
client then could poll the job state to monitor the progress of the 
transfer.

2) Data transfers as part of a larger workflow. By representing data 
transfers as jobs with a status, a workflow engine can treat them as 
just another step in the workflow, and monitor the status by examining 
the job state.

3) Complex transfers of data in specific formats e.g. importing a 
votable into a TAP service, where the recipient is expected to interpret 
the data in a specific way. If the job state reports success when all of 
the bytes arrive, then we need another mechanism to report a failure to 
import invalid or badly formatted data.

4) Limited lifetime on the endpoint URLs. The server may want limit the 
lifetime of the endpoint URLs, allowing it to manage its resources and 
close off unused transfers. The job lifetime provides a way of telling 
the client about this. If the client completes the negotiation but 
doesn't do the PUT, then the endpoint URLs may become invalid when the 
lifetime of the job expires. If the client wants more time, then it can 
use the UWS protocol to negotiate with the server to extend the lifetime 
of the job and hence the endpoint URLs.

Many service providers will record this kind information anyway for 
performance monitoring and resource allocation. Exposing this 
information in the job state provides a mechanism for making it 
available to the end user.

The trade off has always been trying to enable this level of status 
reporting and error handling for the more complex use cases while still 
making it fairly simple to implement a basic HTTP binary blob GET/PUT 
service.

As Matthew says, a simple client can just follow the redirect, get the 
endpoint, push the data and be done. The details in the job status are 
there to support more complex clients who want to check if the data 
arrived correctly and find out what happened if it didn't.

Hope this helps,
Dave



More information about the vospace mailing list